AI Chatbot Miles

AI Chatbot Miles — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Anomaly detection

    Anomaly detection

    In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers. Three broad categories of anomaly detection techniques exist. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application. == Definition == Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined: === Ill defined === An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data. An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features. Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour. === Specific === Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier. == History == === Intrusion detection === The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior. By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity. The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection. As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures. == Applications == Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement. === Intrusion detection === Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning. Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of anomaly detection in intrusion detection is misuse detection. === Fintech fraud detection === Anomaly detection is vital in fintech for fraud prevention. === Preprocessing === Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy. === Video surveillance === Anomaly detection has become increasingly vital in video surveillance to enhance security and safety. With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data. These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations. An important aspect for video surveillance is the development of scalable real-time frameworks. Such pipelines are required for processing multiple video streams with low computational resources. === IT infrastructure === In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services. These are complex systems, composed of many interactive elements and large data quantities, requiring methods to process and reduce this data into a human and machine interpretable format. Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience. Detected anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness. === IoT systems === Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems. It helps in identifying system failures and security breaches in complex networks of IoT devices. The methods must manage real-time data, diverse device types, and scale effectively. Garg et al. have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems. === Petroleum industry === Anomaly detection is crucial in the petroleum industry for monitoring critical machinery. A 2015 paper proposed a novel segmentation algorithm using support vector machines to analyze sensor data for real-time anomaly detection. === Oil and gas pipeline monitoring === In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection. Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.

    Read more →
  • DiscoVision

    DiscoVision

    DiscoVision is the name of several things related to the video LaserDisc format. It was the original name of the "Reflective Optical Videodisc System" format later known as "LaserVision" or LaserDisc. == Description == MCA DiscoVision, Inc. was a division of entertainment giant MCA (Music Corporation of America), established in 1969 to develop and sell an optical videodisc system. MCA released discs pressed in Carson and Costa Mesa, California on the DiscoVision label from the format's Atlanta, Georgia launch in 1978 to 1982 and the release of the film The Four Seasons. DiscoVision titles included films from Universal Pictures, Paramount Pictures, Warner Bros. Pictures, and Disney content. Agreements were made with Columbia Pictures and United Artists, though no discs were released on the DiscoVision label from either studio. Most of these companies later established their own labels for the format, the first being Paramount with a dozen movies released on the Paramount Home Video label in the summer of 1981. The successor to MCA DiscoVision, DiscoVision Associates (DVA), was the result of a partnership between IBM and MCA. It was hoped that the merger would provide the basis for improvement of the quality of DiscoVision pressings, but no appreciable improvement ever took hold. In 1981, responsibility for the laser videodisc was sold to Pioneer Electronic Corporation, after MCA Discovision had previously started a partnership in 1977 with Pioneer, Universal Pioneer, to produce the Pioneer PR-7820 player (the first industrial model of DiscoVision player from 1978), as well as establishing disc pressing plants in Japan. As part of the partnership, Pioneer, in association with MCA, had a disc replication facility in Kofu, Japan that produced discs. Some of the last DiscoVision label discs were manufactured by Pioneer in Japan. In the same year, MCA discontinued their DiscoVision branding, due to the sale of the technology to Pioneer (who then rebranded the format as LaserDisc) and in turn rebranded their laserdisc releases, now fabricated by Pioneer, under the MCA Videodisc banner; this was changed to the "MCA Home Video" name for both its VHS and videodisc releases. Some of DiscoVision's technical staff went on to form MCA Video Games, in an effort to produce video game cartridges. DiscoVision Associates later evolved into a patent holding company which manages and licenses intellectual property related to LaserDisc, Compact Disc, and optical disc technologies, as well as other non-disc related fields. In 1989, Pioneer acquired DiscoVision Associates where it continues to license its technologies independently. As the portfolio of patent expired, the presence of DiscoVision became less visible. However, it established the success of a patent holding company, which other companies are stimulated to generate royalty income from their own patent portfolio.

    Read more →
  • Blocking of Twitter in Nigeria

    Blocking of Twitter in Nigeria

    Twitter was blocked in Nigeria from 5 June 2021 to 13 January 2022. The government imposed a ban on the social network after it deleted tweets made by, and temporarily suspended, the Nigerian president Muhammadu Buhari, warning the southeastern people of Nigeria, predominantly Igbo people, of a potential repeat of the 1967 Nigerian Civil War due to the ongoing insurgency in Southeastern Nigeria. The Nigerian government claimed that the deletion of the president's tweets factored into their decision, but it was ultimately based on "a litany of problems with the social media platform in Nigeria, where misinformation and fake news spread through it have had real world violent consequences", citing the persistent use of the platform for activities that are capable of undermining Nigeria's corporate existence. In January 2022, Nigeria lifted its blocking of Twitter after the platform agreed to establish a legal entity within the country sometime in the first quarter of 2022. == Background == On 1 June 2021, Nigerian President Muhammadu Buhari posted a tweet threatening a crackdown on regional separatists "in the language they understand". The next day, Twitter deleted the tweet, claiming it was in violation of Twitter rules, but gave no further details. Nigeria's Information Minister Lai Mohammed said that Twitter's actions were part of an unfair double standard, as Twitter had not banned incitement tweets from other groups. During the Nigerian Civil War a majority of deaths resulted from the blockade of Biafra which caused the deaths of millions of civilians from starvation, a fact that was not alluded to in the tweet. The Nigerian government has long held concerns over the use of Twitter in the country. The ongoing local End SARS protest began on Twitter and got amplified in 2020 when it had 48 million tweets in ten days. Buhari's government floated the idea of social media regulation on different occasions prior to banning Twitter. Attempts to pass an anti-social media bill in the past have failed majorly due to massive outcry on Twitter. Days before the ban, the country's minister of information called Twitter's activities in Nigeria suspicious, citing its influence on the End SARS protests. == Aftermath == Three days after Twitter was suspended, it was reported that the move had cost the country over 6 billion naira and would also contribute to the worsening unemployment in the country. ExpressVPN reported an over 200 percent increase in web traffic and searches for VPN spiked across the country. In response, Nigeria's Minister of Justice and Attorney General of the Federation Abubakar Malami at first openly threatened to prosecute citizens who bypass the ban using a VPN but then denied saying so after a screenshot of a Twitter deactivation notification he shared on Facebook showed a VPN logo. Nigeria's cultural minister Lai Mohammed stated the ban would be lifted once Twitter submitted to locally licensing, registration and conditions. "It will be licensed by the broadcasting commission, and must agree not to allow its platform to be used by those who are promoting activities that are inimical to the corporate existence of Nigeria." In late June 2021, Twitter announced it would enter talks with the Nigerian government over the platform's suspension. The talks began in July 2021. On 15 September 2021, Mohammed said the Nigerian government will lift the ban on Twitter in a "few days." The Minister said Twitter gave a progress report of their talks with them, adding that it has been productive and quite respectful. On 1 October 2021, President Muhammadu Buhari in his Independence Day broadcast said Twitter must meet the Nigerian government's five conditions before the suspension of the social media platform will be lifted. The conditions are: Respect for national security and cohesion; registration, physical presence and representation in Nigeria; fair taxation; dispute resolution; local content. == Reactions == The ban was condemned by Amnesty International, the British, Canadian and Swedish diplomatic missions to Nigeria, as well as the United States and the European Union in a joint statement. Two domestic organizations, the Socio-Economic Rights and Accountability Project (SERAP) and the Nigerian Bar Association, indicated intent to challenge the ban in court. Twitter itself called the ban "deeply concerning". Former U.S. President Donald Trump, who was permanently suspended from Twitter following the United States Capitol attack in January, praised the ban, stating "Congratulations to the country of Nigeria, who just banned Twitter because they banned their President", and also called on other countries to ban Twitter and Facebook due to "not allowing free and open speech." == Lifting of the ban == On 12 January 2022, the Nigerian Government lifted the ban after Twitter agreed to pay an "applicable tax" and establish "a legal entity in Nigeria during the first quarter of 2022".

    Read more →
  • Account verification

    Account verification

    Account verification is the process of verifying that a new or existing account is owned and operated by a specified real individual or organization. A number of websites, for example social media websites, offer account verification services. Verified accounts are often visually distinguished by check mark icons or badges next to the names of individuals or organizations. Account verification can enhance the quality of online services, mitigating sockpuppetry, bots, trolling, spam, vandalism, fake news, disinformation and election interference. == History == Account verification was introduced by Twitter in June 2009, initially as a feature for public figures and accounts of interest, individuals in "music, acting, fashion, government, politics, religion, journalism, media, sports, business and other key interest areas". A similar verification system was adopted by Google+ in 2011, Facebook page in October 2015 (Available in United States, Canada, United Kingdom, Australia and New Zealand) Facebook profile and Facebook page in 2018 (Available in Worldwide) Instagram in 2014, and Pinterest in 2015. On YouTube, users are able to submit a request for a verification badge once they obtain 100,000 or more subscribers. It also has an "official artist" badge for musicians and bands. In July 2016, Twitter announced that, beyond public figures, any individual would be able to apply for account verification. This was temporarily suspended in February 2018, following a backlash over the verification of one of the organisers of the far-right Unite the Right rally due to a perception that verification conveys "credibility" or "importance". In March 2018, during a live-stream on Periscope, Jack Dorsey, co-founder and CEO of Twitter, discussed the idea of allowing any individual to get a verified account. Twitter reopened account verification applications in May 2021 after revamping their account verification criteria. This time offering notability criteria for the account categories of government, companies, brands, and organizations, news organizations and journalists, entertainment, sports and activists, organizers, and other influential individuals. Instagram began allowing users to request verification in August 2018. In April 2018, Mark Zuckerberg, co-founder and CEO of Facebook, announced that purchasers of political or issue-based advertisements would be required to verify their identities and locations. He also indicated that Facebook would require individuals who manage large pages to be verified. In May 2018, Kent Walker, senior vice president of Google, announced that, in the United States, purchasers of political-leaning advertisements would need to verify their identities. In November 2022, Elon Musk included a blue verification check mark with a paid Twitter Blue monthly membership. Prior to Musk's acquisition of Twitter, Twitter offered this check mark at no charge to confirmed high profile users. On December 19, 2022, Twitter introduced two new check mark colors: gold for accounts from official businesses and organizations, and grey for accounts from governments or multilateral organizations. The type of check mark can be confirmed by visiting the profile page, then clicking or tapping on the check mark. == Techniques == === Identity verification services === Identity verification services are third-party solutions which can be used to ensure that a person provides information which is associated with the identity of a real person. Such services may verify the authenticity of identity documents such as drivers licenses or passports, called documentary verification, or may verify identity information against authoritative sources such as credit bureaus or government data, called nondocumentary verification. === Identity documents verification === The uploading of scanned or photographed identity documents is a practice in use, for example, at Facebook. According to Facebook, there are two reasons that a person would be asked to send a scan of or photograph of an ID to Facebook: to show account ownership and to confirm their name. In January 2018, Facebook purchased Confirm.io, a startup that was advancing technologies to verify the authenticity of identification documentation. === Biometric verification === === Behavioral verification === Behavioral verification is the computer-aided and automated detection and analysis of behaviors and patterns of behavior to verify accounts. Behaviors to detect include those of sockpuppets, bots, cyborgs, trolls, spammers, vandals, and sources and spreaders of fake news, disinformation and election interference. Behavioral verification processes can flag accounts as suspicious, exclude accounts from suspicion, or offer corroborating evidence for processes of account verification. === Bank account verification === Identity verification is required to establish bank accounts and other financial accounts in many jurisdictions. Verifying identity in the financial sector is often required by regulation such as Know Your Customer or Customer Identification Program. Accordingly, bank accounts can be of use as corroborating evidence when performing account verification. Bank account information can be provided when creating or verifying an account or when making a purchase. === Postal address verification === Postal address information can be provided when creating or verifying an account or when making and subsequently shipping a purchase. A hyperlink or code can be sent to a user by mail, recipients entering it on a website verifying their postal address. === Telephone number verification === A telephone number can be provided when creating or verifying an account or added to an account to obtain a set of features. During the process of verifying a telephone number, a confirmation code is sent to a phone number specified by a user, for example in an SMS message sent to a mobile phone. As the user receives the code sent, they can enter it on the website to confirm their receipt. === Email verification === An email account is often required to create an account. During this process, a confirmation hyperlink is sent in an email message to an email address specified by a person. The email recipient is instructed in the email message to navigate to the provided confirmation hyperlink if and only if they are the person creating an account. The act of navigating to the hyperlink confirms receipt of the email by the person. The added value of an email account for purposes of account verification depends upon the process of account verification performed by the specific email service provider. === Multi-factor verification === Multi-factor account verification is account verification which simultaneously utilizes a number of techniques. === Multi-party verification === The processes of account verification utilized by multiple service providers can corroborate one another. OpenID Connect includes a user information protocol which can be used to link multiple accounts, corroborating user information. == Account verification and good standing == On some services, account verification is synonymous with good standing. Twitter reserves the right to remove account verification from users' accounts at any time without notice. Reasons for removal may reflect behaviors on and off Twitter and include: promoting hate and/or violence against, or directly attacking or threatening other people on the basis of race, ethnicity, national origin, sexual orientation, gender, gender identity, religious affiliation, age, disability, or disease; supporting organizations or individuals that promote the above; inciting or engaging in the harassment of others; violence and dangerous behavior; directly or indirectly threatening or encouraging any form of physical violence against an individual or any group of people, including threatening or promoting terrorism; violent, gruesome, shocking, or disturbing imagery; self-harm, suicide; and engaging in other activity on Twitter that violates the Twitter Rules. In April 2023, Blue ticks were removed from all Twitter accounts that had not subscribed to Twitter Blue.

    Read more →
  • Level set (data structures)

    Level set (data structures)

    In computer science, a level set is a data structure designed to represent discretely sampled dynamic level sets of functions. A common use of this form of data structure is in efficient image rendering. The underlying method constructs a signed distance field that extends from the boundary, and can be used to solve the motion of the boundary in this field. == Chronological developments == The powerful level-set method is due to Osher and Sethian 1988. However, the straightforward implementation via a dense d-dimensional array of values, results in both time and storage complexity of O ( n d ) {\displaystyle O(n^{d})} , where n {\displaystyle n} is the cross sectional resolution of the spatial extents of the domain and d {\displaystyle d} is the number of spatial dimensions of the domain. === Narrow band === The narrow band level set method, introduced in 1995 by Adalsteinsson and Sethian, restricted most computations to a thin band of active voxels immediately surrounding the interface, thus reducing the time complexity in three dimensions to O ( n 2 ) {\displaystyle O(n^{2})} for most operations. Periodic updates of the narrowband structure, to rebuild the list of active voxels, were required which entailed an O ( n 3 ) {\displaystyle O(n^{3})} operation in which voxels over the entire volume were accessed. The storage complexity for this narrowband scheme was still O ( n 3 ) . {\displaystyle O(n^{3}).} Differential constructions over the narrow band domain edge require careful interpolation and domain alteration schemes to stabilise the solution. === Sparse field === This O ( n 3 ) {\displaystyle O(n^{3})} time complexity was eliminated in the approximate "sparse field" level set method introduced by Whitaker in 1998. The sparse field level set method employs a set of linked lists to track the active voxels around the interface. This allows incremental extension of the active region as needed without incurring any significant overhead. While consistently O ( n 2 ) {\displaystyle O(n^{2})} efficient in time, O ( n 3 ) {\displaystyle O(n^{3})} storage space is still required by the sparse field level set method. See for implementation details. === Sparse block grid === The sparse block grid method, introduced by Bridson in 2003, divides the entire bounding volume of size n 3 {\displaystyle n^{3}} into small cubic blocks of m 3 {\displaystyle m^{3}} voxels each. A coarse grid of size ( n / m ) 3 {\displaystyle (n/m)^{3}} then stores pointers only to those blocks that intersect the narrow band of the level set. Block allocation and deallocation occur as the surface propagates to accommodate to the deformations. This method has a suboptimal storage complexity of O ( ( n m ) 3 + m 3 n 2 ) {\displaystyle O\left((nm)3+m^{3}n^{2}\right)} , but retains the constant time access inherent to dense grids. === Octree === The octree level set method, introduced by Strain in 1999 and refined by Losasso, Gibou and Fedkiw, and more recently by Min and Gibou uses a tree of nested cubes of which the leaf nodes contain signed distance values. Octree level sets currently require uniform refinement along the interface (i.e. the narrow band) in order to obtain sufficient precision. This representation is efficient in terms of storage, O ( n 2 ) , {\displaystyle O(n^{2}),} and relatively efficient in terms of access queries, O ( log n ) . {\displaystyle O(\log \,n).} An advantage of the level method on octree data structures is that one can solve the partial differential equations associated with typical free boundary problems that use the level set method. The CASL research group has developed this line of work in computational materials, computational fluid dynamics, electrokinetics, image-guided surgery and controls. === Run-length encoded === The run-length encoding (RLE) level set method, introduced in 2004, applies the RLE scheme to compress regions away from the narrow band to just their sign representation while storing with full precision the narrow band. The sequential traversal of the narrow band is optimal and storage efficiency is further improved over the octree level set. The addition of an acceleration lookup table allows for fast O ( log ⁡ r ) {\displaystyle O(\log r)} random access, where r is the number of runs per cross section. Additional efficiency is gained by applying the RLE scheme in a dimensional recursive fashion, a technique introduced by Nielsen & Museth's similar DT-Grid. === Hash Table Local Level Set === The Hash Table Local Level Set method was introduced in 2011 by Eyiyurekli and Breen and extended in 2012 by Brun, Guittet, and Gibou, only computes the level set data in a band around the interface, as in the Narrow Band Level-Set Method, but also only stores the data in that same band. A hash table data structure is used, which provides an O ( 1 ) {\displaystyle O(1)} access to the data. However, Brun et al. conclude that their method, while being easier to implement, performs worse than a quadtree implementation. They find that as it is, [...] a quadtree data structure seems more adapted than the hash table data structure for level-set algorithms. Three main reasons for worse efficiency are listed: to obtain accurate results, a rather large band is required close to the interface, which counterbalances the absence of grid nodes far from the interface; the performances are deteriorated by extrapolation procedures on the outer edges of the local grid and the width of the band restricts the time step and slows down the method. === Point-based === Corbett in 2005 introduced the point-based level set method. Instead of using a uniform sampling of the level set, the continuous level set function is reconstructed from a set of unorganized point samples via moving least squares.

    Read more →
  • IAmAnas

    IAmAnas

    #IAmAnas (I Am Anas) is a Twitter hashtag and social media campaign that started in 2015. Users tweeted to express support for the undercover investigative works of Ghanaian journalist Anas Aremeyaw Anas. The campaign restarted in 2018 when the Ghanaian MP and financier of the New Patriotic Party, Kennedy Agyapong, announced his intention to reveal the identity of Anas following the journalist's exposé of corruption at the Ghana Football Association. Anas maintains that "being anonymous has always been his secret weapon." Pictures purported to be of Anas were first released by a TV station owned by Agyapong, and were quickly picked up by other media houses. At least one person, a Dutch-Brazilian model, has claimed ownership of one picture that was released, and has threatened legal action against Agyapong for possibly putting his life in danger. In response to Agyapong, social media users retweeted photos of themselves, random people, or even comic images of entities that resemble the trademark covered face of Anas. When the hashtag first began in 2015, along with other popular uses of the journalist's name, Elizabeth Ohene wrote an article about Ghanaians use of humour in response to dealing with the expose of government corruption. "I do not know when these words will make it into Wikipedia or the Oxford English Dictionary but for the moment you can take it from me that: To go undercover is to anas, to make secret recordings is to anas-anas, to wear disguises is to do an anas, to be caught in the act is to be anased. To have someone exposed taking bribes is to have that person being given the full Anas Aremeyaw Anas."

    Read more →
  • Boba liberal

    Boba liberal

    Boba liberal is a term mostly used within the Asian diaspora communities in the West, especially in the United States. It describes someone of East or Southeast Asian descent living in the West who has a shallow, surface-level liberal outlook. It is also occasionally used to describe conservatives who weaponize their East or Southeast Asian identity. The neologism emerged among the Asian American leftist community on Twitter who accused "boba liberals" of only holding their liberal beliefs to appear more white-adjacent by engaging in progressive social movements or viewpoints, while at the same time disregarding and trivializing issues concerning Asians. Mary Chao, writing for The North Jersey Record, said that "Asians call peers boba liberals when they aspire to liberal whiteness." An article in The Yale Herald described it as a term "used to describe the ethnocentric politics of Asian Americans, usually of East Asian descent, who exclusively advocate for issues that benefit themselves, without acknowledging problematic dimensions of their own history and working to support other people of color." The feminist magazine Fem said that "the faces of boba liberalism are Asian Americans that are part of the middle and upper economic class. As a result, boba liberals disregard the negative effects of capitalism because they profit from it. For instance, boba liberals tend to focus on advocating for Asian representation in white spaces, or discussing whether or not wearing chopsticks in one's hair is culture appropriation. These topics are popular within boba liberal circles, all while dialogue regarding inequality, globalization, and racial injustice are purposely neglected." UnHerd notes that conservative Asian Americans have used the term not to critique capitalism, but to "aim at a small but influential group of progressive Asian-American activists who are supposedly selling out other Asians, especially working-class Asians, in order to win brownie points from elite, generally white liberals." MRAsians have similarly used the term to attack Asian American feminists who supported the Black Lives Matter movement. The Asian identity of boba liberals has often been accused of being shallow and superficial. Boba liberals are accused of using surface-level stereotypical Asian traits such as liking boba tea to bolster their Asian credentials. Plan A Magazine, an Asian diaspora magazine, described the film Crazy Rich Asians and the sitcom Fresh Off the Boat as "boba liberal media", calling them the result of "a specific kind of atomized identity politics". Other media outlets have connected the Crazy Rich Asians film to boba liberalism. == Controversy == The term "boba liberal" was coined in 2019 by Vietnamese American Twitter user Redmond (@diaspora_is_red) to analyze a form of Asian American liberalism through a Marxist lens. Redmond has criticized the misappropriation of their neologism by stripping away the Marxist framework by failing to discuss "socialism, communism, the capitalist system, imperialism, and the diaspora bourgeoisie" and conflating "boba liberalism" with the flawed concept of "East Asian privilege". In 2024, Redmond criticized misuse of the term by conservatives and liberals, and said "The term boba liberalism can go away for all I care. It's corny and stale". === United States === One commentator described boba liberals as supporting policies that primarily benefit upper-income Asian-Americans, and not necessarily the Asian-American community as a whole. Therefore, while the word "liberal" is used in the term, it is not mutually exclusive to one specific ideology, as it may also extend to conservative-aligned Asians in some areas, as they would often take advantage of the "model minority" label by defending such measures.

    Read more →
  • Digital artifactual value

    Digital artifactual value

    Digital artifactual value, a preservation term, is the intrinsic value of a digital object, rather than the informational content of the object. Though standards are lacking, born-digital objects and digital representations of physical objects may have a value attributed to them as artifacts. == Intrinsic value in analog materials == With respect to analog or non-digital materials, artifacts are determined to have singular research or archival value if they possess qualities and characteristics that make them the only acceptable form for long-term preservation. These qualities and characteristics are commonly referred to as the item's intrinsic value and form the basis upon which digital artifactual value is currently evaluated. Artifactual value based on this idea is predicated upon the artifact's originality, faithfulness, fixity, and stability. The intrinsic value of a particular object, as interpreted by archival professionals, largely determines the selection process for archives. The National Archives and Records Administration Committee on Intrinsic Value in "Intrinsic Value in Archival Material" classified an analog object as having intrinsic value if it possessed one or more of the follow qualities: Physical form that may be the subject for study if the records provide meaningful documentation or significant examples of the form. Aesthetic or artistic quality. Unique or curious physical features. Age that provides a quality of uniqueness. Value for use in exhibits. Questionable authenticity, date, author, or other characteristic that is significant and ascertainable by physical examination. General and substantial public interest because of direct association with famous or historically significant people, places, things, issues or events. Significance as documentation of the establishment or continuing legal basis of an agency or institution. Significance as documentation of the formulation of policy at the highest executive levels when the policy has significance and broad effect throughout or beyond the agency or institution. Other archival professionals such as Lynn Westney have written that the characteristics of materials exhibiting intrinsic value include age, content, usage, particularities of creation, signatures, and attached seals. Westney and others have stated that paper-based artifacts can be thought to have evidentiary value, or significant contextual markings, insofar that the original manifestation of the artifact can attest to the originality, faithfulness or authenticity, fixity, and stability of the content. For other analog materials, properly articulating intrinsic value remains essential for determining artifactual value. Similar to paper-based objects in many respects, artifactual value for images typically takes into account artistic value, age, authorial prestige, significant provenance, and institutional priorities. Analog audio preservation is based upon similar factors, including the cultural value of the item, its historical uniqueness, the estimated longevity of the medium, the current condition of the item, and the state of playback equipment, among other things. == Analog conventions in a digital realm == The standard definition of artifactual value, as it has applied to analog or non-digital materials in the twentieth century, is based upon a set of conventions which do not ordinarily apply to digital objects in toto. The Council on Library and Information Resources (CLIR) has stated that printed texts and other paper-based manuscripts, when considered as objects, are imbued with meaning distilled from a general set of understandings inherent to these conventions: The object is of a fixed and stable composition/form. Authorship and intellectual property are a recognizable concept. Duplication is possible. Fungibility of informational content (or, in other words, the ability to be replaced by another identical object). These conventions are important to consider because they help to describe the physical and even metaphysical relationship between a document's content and its physical manifestation. The underpinnings of this relationship are not identical and do not apply with the same degree of clarity to an immaterial digital realm. The idea of fixity with regard to printed materials, for example, is largely predicated on the notion that an object has been recorded on a relatively stable medium. The physical presence of a print text serves as proof of its authenticity as an object or artifact, as well as its scarcity and uniqueness in relation to other print materials. Variations in the chemical properties and storage conditions of print-based materials, as well as other cultural variables, certainly impact the fixity or stability of print materials, but there is little controversy about determining its fundamental existence or originality. However, uniqueness in the physical, paper-based sense does not translate to a digital realm in which immaterial objects are subject to theoretically infinite levels of reproduction and dissemination. Born-digital and digital surrogates may or may not look any different from each other on a server, and alterations can be made without explicit notice to the user. These alterations are normally called migration events, or actions taken on the digital object that change the original object's composition. They can enact subtle but fundamental alterations to the original document, thereby compromising its existence as an original object. Furthermore, because the tools used to generate and access digital objects have historically evolved quite rapidly, issues of playback obsolescence, incapability, data loss, and broken pathways to information have changed traditional ideas of fixity and stability. Therefore, artifactual value in a digital realm requires a modified set of generalized standards for determining artifactual originality. Michael J. Giarlo and Ronald Jantz, only two of many, have posited a list of methods for establishing digital intrinsic value by way of careful metadata generation and records maintenance. In their report, a digital original possesses three key characteristics that distinguishes it from identical copies. These include continuous verification and re-verification of the document's digital signature starting from the date of creation; retaining versions and recordings of all changes to the object in an audit trail; and having the archival master contain the creation date of the digital object. They also reported that originality in digital sources could be verified or produced by the following techniques: Digital object is given a date-time stamp that's automatically inserted into the METS-XML header upon creation. Date-time is inserted into archival metadata. Encapsulation. Digital signatures. == The role of digital surrogates == Digital surrogates are considered a utility for aiding in the preservation and increased access of certain artifacts. However, digital surrogates can have different utilities for objects depending on the nature of the original artifact and the condition the artifact is in. In 2001 the Council on Library and Information Resources (CLIR) published a report on the artifact in library collections. The CLIR states that the utility of the digital surrogate can be determined by dividing the original material (artifact) into two different categories, artifacts that are rare and those that are not. These two categories can be further divided by two categories, artifacts that are frequently used and those that are not. === Materials that are frequently used and not rare === According to the CLIR "it is not obvious that digital surrogates provide all the functionality, all the information, or all the aesthetic value of originals. Therefore, while it may be sensible to recommend that digital surrogates be used to reduce the cost and increase the availability of library holdings that circulate frequently, the decision to deaccession a physical object in library collections and replace it with a digital surrogate should be based on a careful assessment of the way in which library patrons use the original object or objects of its kind." === Materials that are infrequently used and not rare === Keeping the original is always the best solution for libraries and especially archives but in the case of libraries where an artifact is not rare or used infrequently there must be a barometer that is developed to help "balance functionality with actual use in order to help decide when digital surrogates that provide most of the functionality of originals are acceptable." === Materials that are rare and frequently used === A professional in the field of Library and Information Science (LIS) would almost certainly not argue that a digital surrogate could replace a rare object. However, in the case of a rare object that is falling into poor shape due to heavy use a digital surrogate could be extremely useful in reducing the wear a

    Read more →
  • ELMo

    ELMo

    ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors. It was created by researchers at the Allen Institute for Artificial Intelligence, and University of Washington and first released in February 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a contextual understanding of tokens. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. ELMo was historically important as a pioneer of self-supervised generative pretraining followed by fine-tuning, where a large model is trained to reproduce a large corpus, then the large model is augmented with additional task-specific weights and fine-tuned on supervised task data. It was an instrumental step in the evolution towards transformer-based language modelling. == Architecture == ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. The input text sequence is first mapped by an embedding layer into a sequence of vectors. Then two parts are run in parallel over it. The forward part is a 2-layered LSTM with 4096 units and 512 dimension projections, and a residual connection from the first to second layer. The backward part has the same architecture, but processes the sequence back-to-front. The outputs from all 5 components (embedding layer, two forward LSTM layers, and two backward LSTM layers) are concatenated and multiplied by a linear matrix ("projection matrix") to produce a 512-dimensional representation per input token. ELMo was pretrained on a text corpus of 1 billion words. The forward part is trained by repeatedly predicting the next token, and the backward part is trained by repeatedly predicting the previous token. After the ELMo model is pretrained, its parameters are frozen, except for the projection matrix, which can be fine-tuned to minimize loss on specific language tasks. This is an early example of the pretraining-fine-tune paradigm. The original paper demonstrated this by improving state of the art on six benchmark NLP tasks. === Contextual word representation === The architecture of ELMo accomplishes a contextual understanding of tokens. For example, the first forward LSTM of ELMo would process each input token in the context of all previous tokens, and the first backward LSTM would process each token in the context of all subsequent tokens. The second forward LSTM would then incorporate those to further contextualize each token. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. For example, consider the sentenceShe went to the bank to withdraw money.In order to represent the token "bank", the model must resolve its polysemy in context. The first forward LSTM would process "bank" in the context of "She went to the", which would allow it to represent the word to be a location that the subject is going towards. The first backward LSTM would process "bank" in the context of "to withdraw money", which would allow it to disambiguate the word as referring to a financial institution. The second forward LSTM can then process "bank" using the representation vector provided by the first backward LSTM, thus allowing it to represent it to be a financial institution that the subject is going towards. == Historical context == ELMo is one link in a historical evolution of language modelling. Consider a simple problem of document classification, where we want to assign a label (e.g., "spam", "not spam", "politics", "sports") to a given piece of text. The simplest approach is the "bag of words" approach, where each word in the document is treated independently, and its frequency is used as a feature for classification. This was computationally cheap but ignored the order of words and their context within the sentence. GloVe and Word2Vec built upon this by learning fixed vector representations (embeddings) for words based on their co-occurrence patterns in large text corpora. Like BERT (but unlike "bag of words" such as Word2Vec and GloVe), ELMo word embeddings are context-sensitive, producing different representations for words that share the same spelling. It was trained on a corpus of about 30 million sentences and 1 billion words. Previously, bidirectional LSTM was used for contextualized word representation. ELMo applied the idea to a large scale, achieving state of the art performance. After the 2017 publication of Transformer architecture, the architecture of ELMo was changed from a multilayered bidirectional LSTM to a Transformer encoder, giving rise to BERT. BERT has a similar pretrain-fine-tune workflow, but uses a Transformer with implications for more parallelizable training.

    Read more →
  • Digital edition

    Digital edition

    A digital edition is an online magazine or online newspaper delivered in electronic form which is formatted identically to the print version. Digital editions are often called digital facsimiles to underline the likeness to the print version. Digital editions have the benefit of reduced cost to the publisher and reader by avoiding the time and the expense to print and deliver paper edition. This format is considered more environmentally friendly due to the reduction of paper and energy use. These editions also often feature interactive elements such as hyperlinks both within the publication itself and to other internet resources, search option and bookmarking, and can also incorporate multimedia such as video or animation to enhance articles themselves or for advertisement purposes. Some delivery methods also include animation and sound effects that replicate turning of the page to further enhance the experience of their print counterparts. Magazine publishers have traditionally relied on two revenue sources: selling ads and selling magazines. Additionally some publishers are using other electronic publication methods such as RSS to reach out to readers and inform them when new digital editions are available. Current technologies are generally either reader-based, requiring a download of an application and subsequent download of each edition, or browser-based, often using Macromedia Flash, requiring no application download (such as Adobe Acrobat). Some application-based readers allow users to access editions while not connected to internet. Dedicated hardware such as the Amazon Kindle and the iPad is also available for reading digital editions of select books, popular national magazines such as Time, The Atlantic, and Forbes and popular national newspapers such as the New York Times, Wall Street Journal, and Washington Post. Archives of print newspapers, in some cases dating hundreds of years back, are being digitized and made available online. Google is indexing existing digital archives produced by the newspapers themselves or by third parties. Newspaper and magazine archival began with microform film formats solving the problem of efficiently storing and preserving. This format, however, lacked accessibility. Many libraries, especially state libraries in the United States are archiving their collections digitally and converting existing microfilm to digital format. The Library of Congress provides project planning assistance and the National Endowment for the Humanities procures funding through grants from its National Digital Newspaper Program. Digital magazines, ezines, e-editions and emags are sometimes referred to as digital editions, however some of these formats are published only in digital format unlike digital editions which replicate a printed edition as well. == Digital magazines == Digital-replica magazines number in thousands—consumer and business publications, house magazines for associations, institutions and corporations – and conversion from print to digital was still increasing as of 2009. A 2008 report funded by digital-replica technology providers and auditing agencies counted 1,786 digital-replica editions having more than 7 million circulation among business-to-business publications, of which 230 editions were audited The same report counted 1,470 digital-replica editions of consumer magazines having 5.5 million digital circulation, of which 240 editions were audited. These authors estimated that by year end of 2009 there would be 8,000 digital magazines, having a combined distribution of more than 30 million people. Surveys have shown that, while not all subscribers prefer a digital edition, some do because of the environmental benefit and also because digital magazines are searchable and may easily be passed along or linked to. One such survey funded by a digital publisher reported on inputs from more than 30,000 subscribers to business, consumer and other digital magazines. == Digital magazine business models == === Reduced printing and distribution costs === The publishers' choice to save by moving some or all subscribers from print to digital is widely accepted. Oracle magazine, which has 176,000 of its 516,000 subscribers receiving digital according to its June 2009 BPA circulation statement, is said to be the most widely circulated digital edition of a business-to-business publication. Publishers who do this need to choose whether to make some issues all-digital, move some subscribers to digital edition, add some digital-only subscribers, or send all subscribers the digital edition. === Paid subscription revenue === In 2009, a major consumer magazine, PC Magazine, went all-digital, charging an annual subscription fee for its digital-replica edition. Many consumer magazines and newspapers are already available in eReader formats that are sold through booksellers. === Sponsorship and advertising revenue === Digital editions often carry special "front cover" advertising, or advertising on the email message alerting the subscriber of the digital edition. Publishers also produce special digital-only inserts and rich-media ads or advertorials. === Designed-for-digital issues === Another approach is to fully replace printed issues with digital ones, or to use digital editions for extra issues that would otherwise have to be printed.

    Read more →
  • Algorithmic curation

    Algorithmic curation

    Algorithm curation is the selection of online media by technologies such as recommender systems and personalized search. Curation entails the selective sharing of online content and recommendations based on inferred interests. Curation algorithms implement different filter approaches, such as collaborative filtering and content-based filtering. Examples include search engine and social media products such as the Twitter feed, Facebook's News Feed, and Google Personalized Search. == History == === Early algorithmic curation === Online platforms use newsfeed algorithms to determine what content to present to each user. The volume of content published on social media platforms created a need for automated filtering, as manual review of all available content by users is not feasible. These systems function as a form of gatekeeper, shaping which new material users are exposed to and influencing knowledge, attention, and political exposure. ==== Information overload ==== Early ranking algorithms addressed information overload by surfacing the most recent or most popular posts. Later systems shifted toward ranking content based on predicted engagement, aiming to increase the time users spend on a platform. Research has found that these engagement-oriented systems can increase the spread of misinformation and contribute to political polarization as a side effect of optimising for user interaction. ==== How algorithm changes users' feeds over time ==== Algorithmic curation has been found to increase source diversity in some respects while simultaneously reducing the number of external links presented to users, which limits exposure to off-platform content. Research using agent-based modelling has examined how user behaviour, information quality, and algorithmic design interact with one another over time. === Emergence of AI === Platforms increasingly shifted from rule-based ranking systems toward machine-learning and AI-driven approaches, which allow feeds to be personalised at a larger scale and with greater responsiveness to user behaviour. For example, X (formerly Twitter) moved away from a chronological feed toward an AI-powered ranking system that personalises content for each user. These systems are capable of making ranking decisions across volumes of content and user interactions that would not be practical to handle manually. == Approach == === Filter types === ==== Collaborative filtering ==== Collaborative filtering (CF) methods create recommendations based on a person's usage patterns. CF predicts a person's preference for an item by matching their interests with those of users who have similar interests. This process allows for the sharing of ratings between users with similar profiles. CF is based on patterns of human behaviour rather than machine analysis of content itself. Users of CF systems rate items they have interacted with, and these ratings form a profile of interests. The CF system then matches that user with others who have similar profiles, and uses their ratings to generate recommendations. Collaborative filtering can be applied across various content types including text, images, music, and financial products, and can account for complex attributes such as taste and quality that are difficult to represent explicitly. ==== Content-based filtering ==== Content-based filtering (CBF) builds a user profile to represent the types of items a user has engaged with, based on keywords and attributes used to describe those items. Recommendations are generated by presenting items similar to those the user has previously engaged with or is currently viewing. The CBF method creates a profile for each item based on discrete attributes and features, and then constructs a content-based user profile using a weighted vector of those features derived from items the user has rated, purchased, or interacted with. The weights represent the relative importance of each feature, and can be computed using techniques such as Bayesian classifiers, cluster analysis, decision trees, and artificial neural networks, with the goal of estimating the probability that a user will engage with a suggested item. One application of content-based filtering is Pandora Radio, where users provide an artist, genre, or composer to generate a station, and the system surfaces music with similar attributes. == Technology == === Recommender system === Recommender systems rank and suggest content to users based on a combination of implicit and explicit user input. Implicit signals include time spent viewing or engaging with a specific item. Explicit signals include actions such as liking posts, saving store pages, reading news articles, or sharing content. === Personalized search === Personalized search aims to retrieve results most relevant to the user by incorporating contextual factors beyond the explicit query, such as past queries, browsing history, and inferred interests. Social media platforms such as X (formerly Twitter) and Bluesky generate recommendations based on similar users and the content those users interact with. Personalized search may also allow users to explicitly filter results by blocking content containing certain phrases or hashtags. For first-time users without prior history, personalized search may draw on content-based filtering to establish an initial context. Similar processes are used by search engines and retail platforms to tailor results and product recommendations to individual users. == AI contribution == Artificial intelligence contributes to algorithmic curation through machine-learning models capable of processing large volumes of data. Techniques such as deep learning and reinforcement learning allow curation algorithms to model user preferences with greater granularity alongside established filtering approaches. This enables platforms to adjust content rankings rapidly in response to user behaviour. In social media and streaming contexts, AI-driven systems arrange feeds according to predicted relevance, with the outputs shaped by patterns present in the training data. == Social media and potential impact == === Echo chambers === Social media algorithms, such as those used by X (formerly Twitter), recommend content that the system predicts a user will engage with positively. Content from accounts with differing perspectives is less likely to be surfaced, which may reduce source and topic diversity and contribute to the formation of echo chambers. For example, Facebook's news feed is designed to surface content aligned with users' prior engagement, which may reinforce existing views. This dynamic may contribute to filter bubbles, in which users are seldom exposed to content outside their existing interests. Users may further narrow their feeds by actively blocking certain content or accounts. === Over-representation === A pattern observed across social media platforms is the concentration of algorithmic visibility among a small subset of users. Content from the most active users, those with the largest followings, or those generating the most engagement tends to be surfaced more frequently, meaning a small number of accounts can account for a disproportionate share of what appears in other users' feeds.

    Read more →
  • CloudLibrary

    CloudLibrary

    CloudLibrary (stylized as "cloudLibrary") is a cloud-based software system through which libraries lend electronic books; it is also the name of the app that users download to access the e-books. CloudLibrary was created in 2011 by 3M as part of its library systems unit as a competitor to OverDrive, Inc.; in 2015 3M sold the North American part of that unit to Bibliotheca Group GmbH, a company founded in 2011 that was funded by One Equity Partners Capital Advisors, a division of JP Morgan Chase. By 2019, Bibliotecha had tried, unsuccessfully, to negotiate with Amazon to add Kindle-ebook compatibility to cloudLibrary - something that, as of then, Amazon had only made available to Overdrive. In that year, cloudLibrary, along with hoopla offered by Midwest Tape, ODILO, and Baker & Taylor’s Axis 360, were the main competitors to the Overdrive and Libby apps offered by OverDrive, Inc. in the library e-book market. In April 2024, Bibliotheca sold cloudLibrary to the nonprofit cooperative OCLC. By that time, cloudLibrary was used by around 500 libraries in around 20 countries in around 50 languages, and was used to lend audiobooks, digital magazines, newspapers, and comics, and streaming media, along with e-books.

    Read more →
  • Referring expression generation

    Referring expression generation

    Referring expression generation (REG) is the subtask of natural language generation (NLG) that received most scholarly attention. While NLG is concerned with the conversion of non-linguistic information into natural language, REG focuses only on the creation of referring expressions (noun phrases) that identify specific entities called targets. This task can be split into two sections. The content selection part determines which set of properties distinguish the intended target and the linguistic realization part defines how these properties are translated into natural language. A variety of algorithms have been developed in the NLG community to generate different types of referring expressions. == Types of referring expressions == A referring expression (RE), in linguistics, is any noun phrase, or surrogate for a noun phrase, whose function in discourse is to identify some individual object (thing, being, event...) The technical terminology for identify differs a great deal from one school of linguistics to another. The most widespread term is probably refer, and a thing identified is a referent, as for example in the work of John Lyons. In linguistics, the study of reference relations belongs to pragmatics, the study of language use, though it is also a matter of great interest to philosophers, especially those wishing to understand the nature of knowledge, perception and cognition more generally. Various devices can be used for reference: determiners, pronouns, proper names... Reference relations can be of different kinds; referents can be in a "real" or imaginary world, in discourse itself, and they may be singular, plural, or collective. === Pronouns === The simplest type of referring expressions are pronoun such as he and it. The linguistics and natural language processing communities have developed various models for predicting anaphor referents, such as centering theory, and ideally referring-expression generation would be based on such models. However most NLG systems use much simpler algorithms, for example using a pronoun if the referent was mentioned in the previous sentence (or sentential clause), and no other entity of the same gender was mentioned in this sentence. === Definite noun phrases === There has been a considerable amount of research on generating definite noun phrases, such as the big red book. Much of this builds on the model proposed by Dale and Reiter. This has been extended in various ways, for example Krahmer et al. present a graph-theoretic model of definite NP generation with many nice properties. In recent years a shared-task event has compared different algorithms for definite NP generation, using the TUNA corpus. === Spatial and temporal reference === Recently there has been more research on generating referring expressions for time and space. Such references tend to be imprecise (what is the exact meaning of tonight?), and also to be interpreted in different ways by different people. Hence it may be necessary to explicitly reason about false positive vs false negative tradeoffs, and even calculate the utility of different possible referring expressions in a particular task context. === Criteria for good expressions === Ideally, a good referring expression should satisfy a number of criteria: Referential success: It should unambiguously identify the referent to the reader. Ease of comprehension: The reader should be able to quickly read and understand it. Computational complexity: The generation algorithm should be fast No false inferences: The expression should not confuse or mislead the reader by suggesting false implicatures or other pragmatic inferences. For example, a reader may be confused if he is told Sit by the brown wooden table in a context where there is only one table. == History == === Pre-2000 era === REG goes back to the early days of NLG. One of the first approaches was done by Winograd in 1972 who developed an "incremental" REG algorithm for his SHRDLU program. Afterwards researchers started to model the human abilities to create referring expressions in the 1980s. This new approach to the topic was influenced by the researchers Appelt and Kronfeld who created the programs KAMP and BERTRAND and considered referring expressions as parts of bigger speech acts. Some of their most interesting findings were the fact that referring expressions can be used to add information beyond the identification of the referent as well as the influence of communicative context and the Gricean maxims on referring expressions. Furthermore, its skepticism concerning the naturalness of minimal descriptions made Appelt and Kronfeld's research a foundation of later work on REG. The search for simple, well-defined problems changed the direction of research in the early 1990s. This new approach was led by Dale and Reiter who stressed the identification of the referent as the central goal. Like Appelt they discuss the connection between the Gricean maxims and referring expressions in their culminant paper in which they also propose a formal problem definition. Furthermore, Reiter and Dale discuss the Full Brevity and Greedy Heuristics algorithms as well as their Incremental Algorithm(IA) which became one of the most important algorithms in REG. === Later developments === After 2000 the research began to lift some of the simplifying assumptions, that had been made in early REG research in order to create more simple algorithms. Different research groups concentrated on different limitations creating several expanded algorithms. Often these extend the IA in a single perspective for example in relation to: Reference to Sets like "the t-shirt wearers" or "the green apples and the banana on the left" Relational Descriptions like "the cup on the table" or "the woman who has three children" Context Dependency, Vagueness and Gradeability include statements like "the older man" or "the car on the left" which are often unclear without a context Salience and Generation of Pronouns are highly discourse dependent making for example "she" a reference to "the (most salient) female person" Many simplifying assumptions are still in place or have just begun to be worked on. Also a combination of the different extensions has yet to be done and is called a "non-trivial enterprise" by Krahmer and van Deemter. Another important change after 2000 was the increasing use of empirical studies in order to evaluate algorithms. This development took place due to the emergence of transparent corpora. Although there are still discussions about what the best evaluation metrics are, the use of experimental evaluation has already led to a better comparability of algorithms, a discussion about the goals of REG and more task-oriented research. Furthermore, research has extended its range to related topics such as the choice of Knowledge Representation(KR) Frameworks. In this area the main question, which KR framework is most suitable for the use in REG remains open. The answer to this question depends on how well descriptions can be expressed or found. A lot of the potential of KR frameworks has been left unused so far. Some of the different approaches are the usage of: Graph search which treats relations between targets in the same way as properties. Constraint Satisfaction which allows for a separation between problem specification and the implementation. Modern Knowledge Representation which offers logical inference in for example Description Logic or Conceptual Graphs. == Problem definition == Dale and Reiter (1995) think about referring expressions as distinguishing descriptions. They define: The referent as the entity that should be described The context set as set of salient entities The contrast set or potential distractors as all elements of the context set except the referent A property as a reference to a single attribute–value pair Each entity in the domain can be characterised as a set of attribute–value pairs for example ⟨ {\displaystyle \langle } type, dog ⟩ {\displaystyle \rangle } , ⟨ {\displaystyle \langle } gender, female ⟩ {\displaystyle \rangle } or ⟨ {\displaystyle \langle } age, 10 years ⟩ {\displaystyle \rangle } . The problem then is defined as follows: Let r {\displaystyle r} be the intended referent, and C {\displaystyle C} be the contrast set. Then, a set L {\displaystyle L} of attribute–value pairs will represent a distinguishing description if the following two conditions hold: Every attribute–value pair in L {\displaystyle L} applies to r {\displaystyle r} : that is, every element of L {\displaystyle L} specifies an attribute–value that r {\displaystyle r} possesses. For every member c {\displaystyle c} of C {\displaystyle C} , there is at least one element l {\displaystyle l} of L {\displaystyle L} that does not apply to c {\displaystyle c} : that is, there is an l {\displaystyle l} in L {\displaystyle L} that specifies an attribute–value that c {\displaystyle c} does not possess. l {\displaystyle l} is said

    Read more →
  • ProjectExplorer

    ProjectExplorer

    ProjectExplorer is a documentary short film series. The films, directed and produced by ProjectExplorer's Founder, Jenny M Buccos, focus on histories and cultures of foreign places and people using interviews with subject experts, artists, and public figures including Archbishop Desmond Tutu, Dr. John Kani, Greg Marinovich, and Sipho “Hotstix” Mabuse. Produced for a child and young adult audience, segments in each series depict everyday life and the challenges and concerns of those living in the locations and regions featured. Each film is 2–4 minutes in length, with each series containing approximately 40 films. The ProjectExplorer series is distributed internationally without charge via the web by ProjectExplorer, LTD. an American not-for-profit organization. Three series have been produced and distributed. In fall 2009, ProjectExplorer's third series, Jordan, received a GOLD level Parents' Choice Award for excellence in web programming. == Film series == === Shakespeare's England (2006) === The first series was filmed in London, Stratford-upon-Avon, and New York City. The series includes more than 30 film segments. United Kingdom locations and individuals include: The London Eye The Tower of London The Whitechapel Bell Foundry, which demonstrates the process of making a bell Simon Hughes, Member of Parliament and President of the Liberal Democrats The Old Vic The Royal Shakespeare Company The National Archives (UK) Segments filmed in New York City include: Michael Cumpsty discusses and performs monologues from Hamlet (while starring in the Classic Stage Company production) Michael Stuhlbarg discusses and performs a monologue from Macbeth === South Africa (2007) === Filmed in Johannesburg, Cape Town, and KwaZulu Natal, the series contains over 40 film segments including: Ntate Thabong Phosa, a lesiba player from Lesotho. Due to the rarity of lesiba players globally, this is one of the only publicly available examples of the lesiba played on film. A Robben Island piece, filmed at the cell in which Nelson Mandela was held for 18 of his 27-year imprisonment. JSE Securities Exchange with Leigh Roberts, correspondent for CNBC Africa. A 3-part series on HIV/AIDS with amfAR Director of Research, Dr. Rowena Johnson. Dr. Johnson discusses high cost of anti-retroviral drugs and testing in South Africa. The June 16, 1976 Soweto Uprising, with archival film footage and photography from SABC and The Sowetan newspaper. Prominent South Africans featured in the series: Dr. John Kani, Chairperson of the Apartheid Museum and TONY Award Winning Actor Musician Sipho “Hotstix” Mabuse Former U.N. Ambassador Dave A. Steward, Executive Director of the FW de Klerk Foundation Director and producer, Duma Ndlovu Malcolm Purkey, Artistic Director of the Market Theatre === South Africa, Part II (2008) === Filmed in Johannesburg, Cape Town, and New York City, the series contains over 10 film segments. Prominent South Africans featured in the series: Archbishop Desmond Tutu, Nobel Peace Prize laureate Photojournalist Greg Marinovich, Pulitzer Prize winner and co-author of The Bang-Bang Club Vusi Mahlasela, musician Author, Max du Preez === Jordan (2008) === Filmed in Amman, Petra, Umm Qais, Jerash, Madaba, Bethany, the Dead Sea, and New York City, the series contains more than 45 film segments. Jordan series segments include: A tour of the throne room of King Abdullah II, at Raghadan Palace Sharing mansaf with a Bedouin family in the Wadi Rum desert The UNRWA Jabal Hussein refugee camp The Siq, Treasury, and Monastery at Petra The ruins of Gadara at Umm Qais Jerash, the capital and largest city of Jordan's Jerash Governorate Madaba, home of the Madaba Map and the mosaic capital of Jordan The archaeological site at Bethany Traditional clothing from Salt and Ma'an The reintroduction into the wild of the endangered Arabian Oryx The Desert Castles The science of the Dead Sea Her Royal Highness Princess Basma bint Ali and her Royal Botanic Garden

    Read more →
  • Infone

    Infone

    Infone was a service launched by Metro One Telecommunications in 2003. The service was discontinued effective December 14, 2005. == How it worked == Infone included directory assistance and other services via a toll-free phone number. A user could call 888-411-1111 to request directory assistance, directions, traffic information, movie times, call completion, dinner reservation assistance and other services. Infone provided a number of innovative 411 'concierge'-like services, including movie listings from a live operator, and offered a feature where they could provide information from a linked Microsoft Outlook calendar when set up in advance. For a period of time they advertised heavily on U.S. television, featuring ads with then Governor of Minnesota Jesse Ventura, emphasizing their use of all U.S. based operators. The price offered was $0.89 per call up to 15 minutes (for use when the operator connects you to the requested number, as well as for additional information requests afterwards), with $0.05 for each additional minute, making Infone also a competitively priced long-distance service. New users received 5–10 free calls. Infone identified a registered user (along with billing information; the service was only payable by credit card) by caller ID (numbers were registered on signing up) and by an advanced voiceprint recognition system (VPRS) from SpeechWorks that identified the user when the user called from an unregistered telephone number (or no caller ID) through the use of a personal phrase spoken by the user (e.g., "Hello Infone!") after the welcome tone.

    Read more →