AI For Students Anthropic

AI For Students Anthropic — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • PARRY

    PARRY

    PARRY was an early example of a chatbot, implemented in 1972 by psychiatrist Kenneth Colby. == History == PARRY was written in 1972 by psychiatrist Kenneth Colby, then at Stanford University. While ELIZA was a simulation of a Rogerian therapist, PARRY attempted to simulate a person with paranoid schizophrenia. The program implemented a crude model of the behavior of a person with paranoid schizophrenia based on concepts, conceptualizations, and beliefs (judgements about conceptualizations: accept, reject, neutral). It also embodied a conversational strategy, and as such was a much more serious and advanced program than ELIZA. It was described as "ELIZA with attitude". PARRY was tested in the early 1970s using a variation of the Turing Test. A group of experienced psychiatrists analysed a combination of real patients and computers running PARRY through teleprinters. Another group of 33 psychiatrists were shown transcripts of the conversations. The two groups were then asked to identify which of the "patients" were human and which were computer programs. The psychiatrists were able to make the correct identification only 48 percent of the time — a figure consistent with random guessing. PARRY and ELIZA (also known as "the Doctor") interacted several times. The most famous of these exchanges occurred at the ICCC 1972, where PARRY and ELIZA were hooked up over ARPANET and responded to each other.

    Read more →
  • Block swap algorithms

    Block swap algorithms

    In computer algorithms, block swap algorithms swap two regions of elements of an array. It is simple to swap two non-overlapping regions of an array of equal size. However, it is not as simple to swap two contiguous regions of an array of unequal sizes (algorithms that perform such swapping are called rotation algorithms). A few well-known algorithms can accomplish this: Bentley's juggling (also known as the dolphin algorithm), Gries-Mills rotation, triple reversal algorithm, conjoined triple reversal algorithm (also known as the trinity rotation) and Successive rotation. == Triple reversal algorithm == The triple reversal algorithm is the simplest to explain, using rotations. A rotation is an in-place reversal of array elements. This method swaps two elements of an array from outside in within a range. The rotation works for an even or odd number of array elements. The reversal algorithm uses three in-place rotations to accomplish an in-place block swap: Rotate region A Rotate region B Rotate region AB Where A and B are adjacent regions of an array that together form the region AB. Gries-Mills and reversal algorithms perform better than Bentley's juggling, because of their cache-friendly memory access pattern behavior. The triple reversal algorithm parallelizes well, because rotations can be split into sub-regions, which can be rotated independently of others.

    Read more →
  • How to Solve it by Computer

    How to Solve it by Computer

    How to Solve it by Computer is a computer science book by R. G. Dromey, first published by Prentice-Hall in 1982. It is occasionally used as a textbook, especially in India. It is an introduction to the whys of algorithms and data structures. Features of the book: The design factors associated with problems, The creative process behind coming up with innovative solutions for algorithms and data structures, The line of reasoning behind the constraints, factors and the design choices made. The very fundamental algorithms portrayed by this book are mostly presented in pseudocode and/or Pascal notation.

    Read more →
  • Metadata

    Metadata

    Metadata (or metainformation) is data (or information) that defines and describes the characteristics of other data. It often helps to describe, explain, locate, or otherwise make data easier to retrieve, use, or manage. For example, the title, author, and publication date of a book are metadata about the book. But, while a data asset is finite, its metadata is infinite. As such, efforts to define, classify types, or structure metadata are expressed as examples in the context of its use. The term "metadata" has a history dating to the 1960s where it occurred in computer science and in popular culture. Different types of metadata serve different functions. For example, descriptive metadata for a document might include the author, creation date, file size and keywords. Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance. Unique metadata standards exist for different disciplines (e.g., museum collections, digital audio files, websites, etc.). Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc. In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations. == Types == There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials. Administrative metadata – the information to help manage a resource, like resource type, and permissions, and when and how it was created. Reference metadata – the information about the contents and quality of statistical data. Statistical metadata – also called process data, may describe processes that collect, process, or produce statistical data. Legal metadata – provides information about the creator, copyright holder, and public licensing, if provided. Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways. While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata. Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball, metadata can be divided into three categories: technical metadata (or internal metadata), business metadata (or external metadata), and process metadata. Dan Linstedt, creator of the data vault methodology, says business metadata "...provide[s] definition of the functionality, definition of the data, definition of the elements, and definition of how the data is used within business...business metadata includes business requirements, time-lines, business metrics, business process flows, and business terminology." Business metadata is important because it can greatly facilitate the usefulness of the data to business people. A simple example of business metadata is a glossary entry. Hover functionality in an application or web form can enable a glossary definition to be shown when cursor is on a field or term. Other examples of business metadata include annotation ability within applications. For example, a business user may be viewing a business intelligence (BI) report and notice a trend in the data. The user may have background knowledge as to why this trend occurs. Some business intelligence tools enable the user to create an annotation within the report that explains the trend. Such an annotation can enhance other users' understanding of the data. This example is especially powerful because it is created by a business user for the use of other business people. NISO distinguishes three types of metadata: descriptive, structural, and administrative. Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights, while preservation metadata contains information to preserve and save a resource. Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production. An additional type of metadata beginning to be more developed is accessibility metadata. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata. The Schema.org website has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done. == History == Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases. In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards. An early description of "meta data" for computer systems was written by David Griffel and Stuart McIntosh at the MIT Center for International Studies in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data." == Definition == Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include: Means of creation of the data Source of the data Time and date of creation Creator or author of the data Location on a computer network where the data was created Standards used Data quality For example, a digital image may include metadata that describes the size of the image, its color depth, resolution,

    Read more →
  • Endomondo

    Endomondo

    Endomondo is a health and wellness website. It allows users to track their health statistics and provides insights on fitness trends. Originally launched in 2007, Endomondo was acquired by Under Armour in 2015. Under Armour shut down Endomondo in 2020, but, by 2024, Endomondo re-launched as its own entity. == History == Endomondo started in Denmark in 2007 by Mette Lykke, Christian Birk and Jakob Nordenhof Jønck. In 2011, the company opened an office in Silicon Valley, USA, but kept its research and development department in Denmark. In 2013, Endomondo LLC was listed in Red Herring as a European finalists for promising start-ups. The same year, Christian Birk and Jakob Nordenhof Jønck left the daily operation of the company, but kept co-ownership. In February 2015, Endomondo LLC was acquired by athletic apparel maker Under Armour for $85 million. Endomondo, at that time, had over 20 million users. In October 2020, Under Armour announced that Endomondo would be shutting down and selling off MyFitnessPal to the private equity firm Francisco Partners for $345 million. Service stopped on 31 December 2020, giving customers until 15 February 2021 to download an archive of their historic data. In 2024, Endomondo.com was brought back online as a professional fitness guidance website. == Features == Endomondo provides numerous workouts, guidance on exercises, performance-enhancing nutrition, and tips. Previously, Endomondo was able to track numerous fitness attributes such as running routes, distance, duration, and calories. The software helped analyze performance and recommend improvements. There was a free and a paid version available of Endomondo. The free version had advertisements. The paid Premium version was free of advertisements and included additional features such as the possibility to create one's own training plan. The offering of additional features was different between the Android, IOS and Windows platforms, and had significantly better features for tracking performance over time than UnderArmours suggested replacement. Endomondo offered challenges of various types to the user and allowed users to create their own challenges.

    Read more →
  • Data janitor

    Data janitor

    A data janitor is a person who works to take big data and condense it into useful amounts of information. Also known as a "data wrangler", a data janitor sifts through data for companies in the information technology industry. A multitude of start-ups rely on large amounts of data, so a data janitor works to help these businesses with this basic, but difficult process of interpreting data. While it is a commonly held belief that data janitor work is fully automated, many data scientists are employed primarily as data janitors. The information technology industry has been increasingly turning towards new sources of data gathered on consumers, so data janitors have become more commonplace in recent years.

    Read more →
  • Automated journalism

    Automated journalism

    Automated journalism, also known as algorithmic journalism or robot journalism, is a term that attempts to describe modern technological processes that are now in use in the journalistic profession, such as news articles and videos generated by computer programs. There are four main fields of application for automated journalism, namely automated content production, data mining, news dissemination and content optimization. Through generative artificial intelligence, stories are produced automatically by computers rather than human reporters. In the 2020s, generative pre-trained transformers have enabled the generation of articles, simply by providing prompts. Automated journalism is sometimes seen as an opportunity to free journalists from routine reporting, providing them with more time for complex tasks. It also allows efficiency and cost-cutting, alleviating some financial burden that many news organizations face. However, automated journalism is also perceived as a threat to the authorship and quality of news and a threat to the livelihoods of human journalists. == History == Historically, the process involved an algorithm that scanned large amounts of provided data, selected from an assortment of pre-programmed article structures, ordered key points, and inserted details such as names, places, amounts, rankings, statistics, and other figures. These programs interpret, organize, and present data in human-readable ways. The output can also be customized to fit a certain voice, tone, or style. Early implementations were mainly used for stories based on statistics and numerical figures. Common topics include sports recaps, weather, financial reports, real estate analysis, and earnings reviews. Data science and AI companies such as Automated Insights, Narrative Science, United Robots and Monok develop and provide these algorithms to news outlets. In 2016, early adopters included news providers such as the Associated Press, Forbes, ProPublica, and the Los Angeles Times. StatSheet, an online platform covering college basketball, runs entirely on an automated program. In 2006, Thomson Reuters announced their switch to automation to generate financial news stories on its online news platform. Reuters used a tool called Tracer. An algorithm called Quakebot published a story about a 2014 California earthquake on The Los Angeles Times website within three minutes after the shaking had stopped. The Associated Press began using automation to cover 10,000 minor baseball leagues games annually, using a program from Automated Insights and statistics from MLB Advanced Media. Outside of sports, the Associated Press also uses automation to produce stories on corporate earnings. Since 2014, Associated Press has been publishing quarterly financial stories with help from Automated Insights. In May 2020, Microsoft announced that a number of its MSN contract journalists would be replaced by robot journalism. On 8 September 2020, The Guardian published an article entirely written by the neural network GPT-3, although the published fragments were manually picked by a human editor. Agentic Tribune produces all of its news articles automatically using AI. News broadcasters in Kuwait, Greece, South Korea, India, China and Taiwan have presented news with anchors based on generative AI models, prompting concerns about job losses for human anchors and audience trust in news that has historically been influenced by parasocial relationships with broadcasters, content creators or social media influencers. Algorithmically generated anchors have also been used by allies of ISIS for their broadcasts. In 2023, Google reportedly pitched a tool to news outlets that claimed to "produce news stories" based on input data provided, such as "details of current events". Some news company executives who viewed the pitch described it as "[taking] for granted the effort that went into producing accurate and artful news stories." In February 2024, Google launched a program to pay small publishers to write three articles per day using a beta generative AI model. The program does not require the knowledge or consent of the websites that the publishers are using as sources, nor does it require the published articles to be labeled as being created or assisted by these models. Meta AI, a chatbot based on Llama 3 which summarizes news stories, was noted by The Washington Post to copy sentences from those stories without direct attribution and to potentially further decrease the traffic of online news outlets. == Benefits == === Speed === Robot reporters are built to produce large quantities of information at quicker speeds. The Associated Press announced that their use of automation has increased the volume of earnings reports from customers by more than ten times. With software from Automated Insights and data from other companies, they can produce 150 to 300-word articles in the same time it takes journalists to crunch numbers and prepare information. By automating routine stories and tasks, journalists are promised more time for complex jobs such as investigative reporting and in-depth analysis of events. Francesco Marconi of the Associated Press stated that, through automation, the news agency freed up 20 percent of reporters’ time to focus on higher-impact projects. This has also been stated by a spokesperson at Gannett, who stated "By leveraging AI, we are able to expand coverage and enable our journalists to focus on more in-depth reporting." GBH reports that AI tools help increase the reach of news publishers. Mike Carragi, a product manager at Patch, stated that they were able to increase their reach from 1200 communities to 7000 communities in just a few months without the need for new employees solely through the adoption of generative AI. In fact, many communities are served solely by AI generated content, which creates summaries of existing information within the community. === Cost === Automated journalism is cheaper because more content can be produced within less time. It also lowers labour costs for news organizations. Reduced human input means less expenses on wages or salaries, paid leaves, vacations, and employment insurance. Automation serves as a cost-cutting tool for news outlets struggling with tight budgets but still wish to maintain the scope and quality of their coverage. == Concerns == === Authorship === In an automated story, there is often confusion about who should be credited as the author. Several participants of a study on algorithmic authorship attributed the credit to the programmer; others perceived the news organization as the author, emphasizing the collaborative nature of the work. There is also no way for the reader to verify whether an article was written by a robot or human, which raises issues of transparency although such issues also arise with respect to authorship attribution between human authors too. === Credibility and quality === Concerns about the perceived credibility of automated news is similar to concerns about the perceived credibility of news in general. Critics doubt if algorithms are "fair and accurate, free from subjectivity, error, or attempted influence." Again, these issues about fairness, accuracy, subjectivity, error, and attempts at influence or propaganda has also been present in articles written by humans over thousands of years. A common criticism is that machines do not replace human capabilities such as creativity, humour, and critical-thinking. However, as the technology evolves, the aim is to mimic human characteristics. When the UK's Guardian newspaper used an AI to write an entire article in September 2020, commentators pointed out that the AI still relied on human editorial content. Austin Tanney, the head of AI at Kainos said: "The Guardian got three or four different articles and spliced them together. They also gave it the opening paragraph. It doesn’t belittle what it is. It was written by AI, but there was human editorial on that." The largest single study of readers' evaluations of news articles produced with and without the help of automation exposed 3,135 online news consumers to 24 articles. It found articles that had been automated were significantly less comprehensible, in part because they were considered to contain too many numbers. However, the automated articles were evaluated equally on other criteria including tone, narrative flow, and narrative structure. Beyond human evaluation, there are now numerous algorithmic methods to identify machine written articles although some articles may still contain errors that are obvious for a human to identify, they can at times score better with these automatic identifiers than human-written articles. A 2017 Nieman Reports article by Nicola Bruno discusses whether or not machines will replace journalists and addresses concerns around the concept of automated journalism practices. Ultimately, Bruno came to the conclusion that AI would assist journalist

    Read more →
  • Query rewriting

    Query rewriting

    Query rewriting is a typically automatic transformation that takes a set of database tables, views, and/or queries, usually indices, often gathered data and query statistics, and other metadata, and yields a set of different queries, which produce the same results but execute with better performance (for example, faster, or with lower memory use). Query rewriting can be based on relational algebra or an extension thereof (e.g. multiset relational algebra with sorting, aggregation and three-valued predicates i.e. NULLs as in the case of SQL). The equivalence rules of relational algebra are exploited, in other words, different query structures and orderings can be mathematically proven to yield the same result. For example, filtering on fields A and B, or cross joining R and S can be done in any order, but there can be a performance difference. Multiple operations may be combined, and operation orders may be altered. The result of query rewriting may not be at the same abstraction level or application programming interface (API) as the original set of queries (though often is). For example, the input queries may be in relational algebra or SQL, and the rewritten queries may be closer to the physical representation of the data, e.g. array operations. Query rewriting can also involve materialization of views and other subqueries; operations that may or may not be available to the API user. The query rewriting transformation can be aided by creating indices from which the optimizer can choose (some database systems create their own indexes if deemed useful), mandating the use of specific indices, creating materialized and/or denormalized views, or helping a database system gather statistics on the data and query use, as the optimality depends on patterns in data and typical query usage. Query rewriting may be rule based or optimizer based. Some sources discuss query rewriting as a distinct step prior to optimization, operating at the level of the user accessible algebra API (e.g. SQL). There are other, largely unrelated concepts also named similarly, for example, query rewriting by search engines.

    Read more →
  • Truth discovery

    Truth discovery

    Truth discovery (also known as truth finding) is the process of choosing the actual true value for a data item when different data sources provide conflicting information on it. Several algorithms have been proposed to tackle this problem, ranging from simple methods like majority voting to more complex ones able to estimate the trustworthiness of data sources. Truth discovery problems can be divided into two sub-classes: single-truth and multi-truth. In the first case only one true value is allowed for a data item (e.g birthday of a person, capital city of a country). While in the second case multiple true values are allowed (e.g. cast of a movie, authors of a book). Typically, truth discovery is the last step of a data integration pipeline, when the schemas of different data sources have been unified and the records referring to the same data item have been detected. == General principles == The abundance of data available on the web makes more and more probable to find that different sources provide (partially or completely) different values for the same data item. This, together with the fact that we are increasing our reliance on data to derive important decisions, motivates the need of developing good truth discovery algorithms. Many currently available methods rely on a voting strategy to define the true value of a data item. Nevertheless, recent studies, have shown that, if we rely only on majority voting, we could get wrong results even in 30% of the data items. The solution to this problem is to assess the trustworthiness of the sources and give more importance to votes coming from trusted sources. Ideally, supervised learning techniques could be exploited to assign a reliability score to sources after hand-crafted labeling of the provided values; unfortunately, this is not feasible since the number of needed labeled examples should be proportional to the number of sources, and in many applications the number of sources can be prohibitive. == Single-truth vs multi-truth discovery == Single-truth and multi-truth discovery are two very different problems. Single-truth discovery is characterized by the following properties: only one true value is allowed for each data item; different values provided for a given data item oppose to each other; values and sources can either be correct or erroneous. While in the multi-truth case the following properties hold: the truth is composed by a set of values; different values could provide a partial truth; claiming one value for a given data item does not imply opposing to all the other values; the number of true values for each data item is not known a priori. Multi-truth discovery has unique features that make the problem more complex and should be taken into consideration when developing truth-discovery solutions. The examples below point out the main differences of the two methods. Knowing that in both examples the truth is provided by source 1, in the single truth case (first table) we can say that sources 2 and 3 oppose to the truth and as a result provide wrong values. On the other hand, in the second case (second table), sources 2 and 3 are neither correct nor erroneous, they instead provide a subset of the true values and at the same time they do not oppose the truth. == Source trustworthiness == The vast majority of truth discovery methods are based on a voting approach: each source votes for a value of a certain data item and, at the end, the value with the highest vote is select as the true one. In the more sophisticated methods, votes do not have the same weight for all the data sources, more importance is indeed given to votes coming from trusted sources. Source trustworthiness usually is not known a priori but estimated with an iterative approach. At each step of the truth discovery algorithm the trustworthiness score of each data source is refined, improving the assessment of the true values that in turn leads to a better estimation of the trustworthiness of the sources. This process usually ends when all the values reach a convergence state. Source trustworthiness can be based on different metrics, such as accuracy of provided values, copying values from other sources and domain coverage. Detecting copying behaviors is very important, in fact, copy allows to spread false values easily making truth discovery very hard, since many sources would vote for the wrong values. Usually systems decrease the weight of votes associated to copied values or even don’t count them at all. == Single-truth methods == Most of the currently available truth discovery methods have been designed to work well only in the single-truth case. Below are reported some of the characteristics of the most relevant typologies of single-truth methods and how different systems model source trustworthiness. === Majority voting === Majority voting is the simplest method, the most popular value is selected as the true one. Majority voting is commonly used as a baseline when assessing the performances of more complex methods. === Web-link based === These methods estimate source trustworthiness exploiting a similar technique to the one used to measure authority of web pages based on web links. The vote assigned to a value is computed as the sum of the trustworthiness of the sources that provide that particular value, while the trustworthiness of a source is computed as the sum of the votes assigned to the values that the source provides. === Information-retrieval based === These methods estimate source trustworthiness using similarity measures typically used in information retrieval. Source trustworthiness is computed as the cosine similarity (or other similarity measures) between the set of values provided by the source and the set of values considered true (either selected in a probabilistic way or obtained from a ground truth). === Bayesian based === These methods use Bayesian inference to define the probability of a value being true conditioned on the values provided by all the sources. P ( v ∣ ψ ( o ) ) = P ( ψ ( o ) ∣ v ) ⋅ P ( v ) P ( ψ ( o ) ) {\displaystyle P(v\mid \psi (o))={\frac {P(\psi (o)\mid v)\cdot P(v)}{P(\psi (o))}}} where v {\displaystyle \textstyle v} is a value provided for a data item o {\displaystyle \textstyle o} and ψ ( o ) {\displaystyle \textstyle \psi (o)} is the set of the observed values provided by all the sources for that specific data item. The trustworthiness of a source is then computed based on the accuracy of the values that provides. Other more complex methods exploit Bayesian inference to detect copying behaviors and use these insights to better assess source trustworthiness. == Multi-truth methods == Due to its complexity, less attention has been devoted to the study of the multi-truth discovery Below are reported two typologies of multi-truth methods and their characteristics. === Bayesian based === These methods use Bayesian inference to define the probability of a group of values being true conditioned on the values provided by all the data sources. In this case, since there could be multiple true values for each data item, and sources can provide multiple values for a single data item, it is not possible to consider values individually. An alternative is to consider mappings and relations between set of provided values and sources providing them. The trustworthiness of a source is then computed based on the accuracy of the values that provides. More sophisticated methods also consider domain coverage and copying behaviors to better estimate source trustworthiness. === Probabilistic Graphical Models based === These methods use probabilistic graphical models to automatically define the set of true values of given data item and also to assess source quality without need of any supervision. == Applications == Many real-world applications can benefit from the use of truth discovery algorithms. Typical domains of application include: healthcare, crowd/social sensing, crowdsourcing aggregation, information extraction and knowledge base construction. Truth discovery algorithms could be also used to revolutionize the way in which web pages are ranked in search engines, going from current methods based on link analysis like PageRank, to procedures that rank web pages based on the accuracy of the information they provide.

    Read more →
  • List of information schools

    List of information schools

    This list of information schools, sometimes abbreviated to iSchools, includes members of the iSchools organization. The iSchools organization reflects a consortium of over 130 information schools across the globe. == History == The first iSchools Caucus was formed in 1988 by Syracuse, Pittsburgh, and Drexel and was called the Gang of Three (sometimes gang of four with Rutgers). Syracuse renamed the School of Library Science as the School of Information Studies in 1974, and is considered as the first “iSchool” in history. The group was formally named "the iSchools Caucus" or more casually, the iCaucus. By 2003, the group expanded to include the Universities of Michigan, Washington, Illinois, UNC, Florida State, Indiana, and Texas, and was called the Gang of Ten. The current iSchools Caucus organization was formalized by 2005, with additions of UC Berkeley, UC Irvine, UCLA, Penn State, Georgia Tech, Maryland, Toronto, Carnegie Mellon and Singapore Management University. == iSchools organization == The iSchools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces such as online communities, social networking, the World Wide Web, and databases to physical spaces such as libraries, museums, collections, and other repositories. "School of Information", "Department of Information Studies", or "Information Department" are often the names of the participating organizations. Degree programs at iSchools include course offerings in areas such as information architecture, design, policy, and economics; knowledge management, user experience design, and usability; preservation and conservation; librarianship and library administration; the sociology of information; and human-computer interaction and computer science. === Leadership === The executive committee of the iSchools is made up of the current chair (Ina Fourie, University of Pretoria, South Africa), past chair (Gillian Oliver, Monash University, Australia) and the chair elect (Javed Mostafa, University of Toronto Canada), plus representatives from the three regions (North America, Europe, and Asia-Pacific). The current executive director is Slava Sterzer. == Member institutions == Between 2010 and 2026, the organization expanded globally beyond North America, growing to 133 member schools as of March 2026. For an updated and complete list of member schools, please visit the member database of the iSchools. == iConferences == Members of the iSchools organize a regular academic conference, known as the iConference, hosted by a different member institution each year. September 2005: Pennsylvania State University October 2006: University of Michigan February 2008: University of California, Los Angeles February 2009: University of North Carolina February 2010: University of Illinois at Urbana-Champaign February 2011: University of Washington, Seattle February 2012: University of Toronto February 2013: University of North Texas March 2014: Humboldt-Universität zu Berlin March 2015: University of California, Irvine March 2016: Drexel University March 2017: Wuhan University March 2018: University of Sheffield and Northumbria University March 2019: University of Maryland March 2020: University of Borås (virtual only) March 2021: Renmin University of China (virtual only) February/March 2022: University of Texas at Austin, University College Dublin & Kyushu University (virtual only) March 2023: Universitat Oberta de Catalunya March 2024: Jilin University March 2025: Indiana University March/April 2026: Edinburgh Napier University 2027: Victoria University of Wellington == Other schools of information == Other information schools and programs include: Documentation Research and Training Centre, Indian Statistical Institute, Bangalore San Jose State University, School of Information University of Southern California Library Science Degree Ankara University, Department of Information and Records Management, Ankara/Turkey Marmara University, Department of Information and Records Management, Istanbul/Turkey University of Kelaniya, Department of Library and Information Science, Kelaniya/Sri Lanka University of Colombo, National Institute of Library and Information Science (NILIS), Colombo/Sri Lanka Chicago State University, Department of Information Studies

    Read more →
  • Snap rounding

    Snap rounding

    Snap rounding is a method of approximating line segment locations by creating a grid and placing each point in the centre of a cell (pixel) of the grid. The method preserves certain topological properties of the arrangement of line segments. Drawbacks include the potential interpolation of additional vertices in line segments (lines become polylines), the arbitrary closeness of a point to a non-incident edge, and arbitrary numbers of intersections between input line-segments. The 3 dimensional case is worse, with a polyhedral subdivision of complexity n becoming complexity O(n4). There are more refined algorithms to cope with some of these issues, for example iterated snap rounding guarantees a "large" separation between points and non-incident edges. == Algorithm == ... (please edit). See, and https://www.cgal.org/ () == Properties == Canonicity: Efficiency; A number of efficient implementations exist. Conversely there are undesirable properties: Non-idempotence: Repeated applications can cause arbitrary drift of points. Exception on "Stable snap rounding" algorithms, see https://doi.org/10.1016/j.comgeo.2012.02.011

    Read more →
  • Jump-and-Walk algorithm

    Jump-and-Walk algorithm

    Jump-and-Walk is an algorithm for point location in triangulations (though most of the theoretical analysis were performed in 2D and 3D random Delaunay triangulations). Surprisingly, the algorithm does not need any preprocessing or complex data structures except some simple representation of the triangulation itself. The predecessor of Jump-and-Walk was due to Lawson (1977) and Green and Sibson (1978), which picks a random starting point S and then walks from S toward the query point Q one triangle at a time. But no theoretical analysis was known for these predecessors until after mid-1990s. Jump-and-Walk picks a small group of sample points and starts the walk from the sample point which is the closest to Q until the simplex containing Q is found. The algorithm was a folklore in practice for some time, and the formal presentation of the algorithm and the analysis of its performance on 2D random Delaunay triangulation was done by Devroye, Mucke and Zhu in mid-1990s (the paper appeared in Algorithmica, 1998). The analysis on 3D random Delaunay triangulation was done by Mucke, Saias and Zhu (ACM Symposium of Computational Geometry, 1996). In both cases, a boundary condition was assumed, namely, Q must be slightly away from the boundary of the convex domain where the vertices of the random Delaunay triangulation are drawn. In 2004, Devroye, Lemaire and Moreau showed that in 2D the boundary condition can be withdrawn (the paper appeared in Computational Geometry: Theory and Applications, 2004). Jump-and-Walk has been used in many famous software packages, e.g., QHULL, Triangle and CGAL.

    Read more →
  • Sentence extraction

    Sentence extraction

    Sentence extraction is a technique used for automatic summarization of a text. In this shallow approach, statistical heuristics are used to identify the most salient sentences of a text. Sentence extraction is a low-cost approach compared to more knowledge-intensive deeper approaches which require additional knowledge bases such as ontologies or linguistic knowledge. In short, sentence extraction works as a filter that allows only meaningful sentences to pass. The major downside of applying sentence-extraction techniques to the task of summarization is the loss of coherence in the resulting summary. Nevertheless, sentence extraction summaries can give valuable clues to the main points of a document and are frequently sufficiently intelligible to human readers. == Procedure == Usually, a combination of heuristics is used to determine the most important sentences within the document. Each heuristic assigns a (positive or negative) score to the sentence. After all heuristics have been applied, the highest-scoring sentences are included in the summary. The individual heuristics are weighted according to their importance. === Early approaches and some sample heuristics === Seminal papers which laid the foundations for many techniques used today have been published by Hans Peter Luhn in 1958 and H. P Edmundson in 1969. Luhn proposed to assign more weight to sentences at the beginning of the document or a paragraph. Edmundson stressed the importance of title-words for summarization and was the first to employ stop-lists in order to filter uninformative words of low semantic content (e.g. most grammatical words such as of, the, a). He also distinguished between bonus words and stigma words, i.e. words that probably occur together with important (e.g. the word form significant) or unimportant information. His idea of using key-words, i.e. words which occur significantly frequently in the document, is still one of the core heuristics of today's summarizers. With large linguistic corpora available today, the tf–idf value which originated in information retrieval, can be successfully applied to identify the key words of a text: If for example the word cat occurs significantly more often in the text to be summarized (TF = "term frequency") than in the corpus (IDF means "inverse document frequency"; here the corpus is meant by document), then cat is likely to be an important word of the text; the text may in fact be a text about cats.

    Read more →
  • Informedia Digital Library

    Informedia Digital Library

    The Informedia Digital Library is an ongoing research program at Carnegie Mellon University to build search engines and information visualization technology for many types of media. The program has carried out research on spoken document retrieval, video information retrieval, video segmentation, face recognition, and cross-language information retrieval. The Lycos search engine was an early product of the Informedia Digital Library Project. The project is led by Howard Wactlar. Researchers on the project have included: Michael Mauldin, Alex Hauptmann, Michael Christel, Michael Witbrock, Raj Reddy, Takeo Kanade and Scott Stevens.

    Read more →
  • Seismological Facility for the Advancement of Geoscience

    Seismological Facility for the Advancement of Geoscience

    The U.S. National Science Foundation's Seismological Facility for the Advancement of Geoscience (NSF SAGE) is a distributed, multi-user national facility that provides support for state of-the-art seismic research. It is operated by EarthScope Consortium. Its previous operator was the Incorporated Research Institutions for Seismology (IRIS), until its merger with UNAVCO to become EarthScope Consortium. NSF SAGE is one of the two premier geophysical facilities in support of geoscience and geoscience education of the National Science Foundation. The other premiere geophysical facility is NSF GAGE, the Geodetic Facility for the Advancement of Geoscience. The services of the facility include support for the Global Seismographic Network (GSN), Data Services, and instrument support via the EarthScope Primary Instrument Center (EPIC), including magnetotelluric (MT) geophysical research. == Global Seismographic Network (GSN) == NSF SAGE manages 40 stations of the 152-station Global Seismographic Network (GSN) for basic global seismicity and Earth structure research. The GSN also enables earthquake hazard mission-related data operations such as: Earthquake location and characterization Tsunami warning Nuclear explosion monitoring == Data Services == SAGE Data Services (DS) is the largest facility for the archiving, curation, and distribution of seismological and other geophysical data in the world. == EarthScope Primary Instrument Center (EPIC) == The EPIC facility maintains the largest open access, shared-use pool of portable seismic sensors in the world. It is located on the campus of New Mexico Tech. == MT == NSF SAGE provides instruments for magnetotelluric (MT) or electromagnetic geophysical research for the recording of our planet's ambient electric and magnetic fields, which allow for the characterization of the conductivity of the area consisting of the shallow crust to upper mantle. This helps with analysis of results obtained from seismic imaging methodologies. The NSF SAGE facility is: Developing open source MT data formatting and processing software. Providing access to proprietary software products.

    Read more →