AI Tools

AI Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Infogram

    Infogram

    Infogram is a web-based data visualization and infographics platform, created in Riga, Latvia. It allows people to make and share digital charts, infographics and maps. Infogram offers an intuitive WYSIWYG editor that converts users’ data into infographics that can be published, embedded or shared. Users do not need coding skills to use this tool; users include newsrooms, marketing teams, governments, educators and students. The company that created Infogram, also called Infogram, was founded in 2012 in Riga, Latvia and has another office in San Francisco. As of October 2017, Infogram says it has 3 million users who have created charts and infographics that have been viewed more than 1.5 billion times. Infogram was bought by Prezi, a web-based presentation software company, in May 2017. == History == Infogram was founded in February 2012 in Riga, Latvia by Uldis Leiterts, Raimonds Kaže and Alise Dīrika. In January 2013, Infogram won the international Hy Berlin pitch contest. During his pitch, Infogram CEO Uldis Leiterts announced that the company had created more templates and was working with Microsoft to integrate its platform with the contemporaneous version of Microsoft Office. The company also won the 2013 Kantar Information Is Beautiful Award, which “celebrates excellence and beauty in data visualizations, infographics, interactives & information art.” In December 2014, Infogram acquired the Brazil-based data visualization blog, Visualoop. In an effort to expand sales and marketing in the U.S., Infogram secured $1.8 million in funding in February 2014. The announcement was made at TechChill, a startup conference for the Baltics in Riga, Latvia. At the time, the funding was believed to be the largest to date for the company. Infogram won the 2017 National Design Award of Latvia. == Acquisition by Prezi == Prezi, a web-based presentation software company, acquired Infogram in May 2017. Infogram is now a wholly owned subsidiary of Prezi. Infogram was rated #1 on Forbes’ list of “The Best Infographic Tools for 2017,” which was published in September 2017. In October 2017, Infogram announced a new version of its data visualization platform, including a drag-and-drop editor, over 40 new designer templates and social media support.

    Read more →
  • Five safes

    Five safes

    The Five Safes is a framework for helping make decisions about making effective use of data which is confidential or sensitive. It is mainly used to describe or design research access to statistical data held by government and health agencies, and by data archives such as the UK Data Service. It is not an internationally accepted standard. Two of the Five Safes refer to statistical disclosure control, and so the Five Safes is usually used to contrast statistical and non-statistical controls when comparing data management options. == Concept == The Five Safes proposes that data management decisions be considered as solving problems in five 'dimensions': projects, people, settings, data and outputs. The combination of the controls leads to 'safe use'. These are most commonly expressed as questions, for example: These dimensions are scales, not limits. That is, solutions can have a mix of more or fewer controls in each dimension, but the overall aim of 'safe use' independent of the particular mix. For example, a public use file available for open download cannot control who uses it, where or for what purpose, and so all the control (protection) must be in the data itself. In contrast, a file which is only accessed through a secure environment with certified users can contain very sensitive information: the non-statistical controls allow the data to be 'unsafe'. One academic likened the process to a graphic equalizer, where bass and treble can be combined independently to produce a sound the listener likes, which has proven to be a very useful metaphor. This 2023 Data Foundation webinar is an expert discussion of how the elements interact, including an excellent introductory representation. There is no 'order' to the Five Safes, in that one is necessarily more important than the others. However, Ritchie argued that the 'managerial' controls (projects, people, setting) should be addressed before the 'statistical' controls (data, output). The Five Safes concept is associated with other topics which developed from the same programme at ONS, although these are not necessarily implemented. Safe people is associated with 'active researcher management', while safe outputs is linked with principles-based output statistical disclosure control. The Five Safes is a positive framework, describing what is and is not. The EDRU ('evidence-based, default-open, risk-managed, user-centred') attitudinal model is sometimes used to give a normative context == The 'data access spectrum' == From 2003 the Five Safes was also represented in a simpler form as a 'Data Access Spectrum'. The non-data controls (project, people, setting, outputs) tend to work together, in that organisations often see these as a complementary set of restrictions on access. These can then be contrasted with choices about data anonymisation to present a linear representation of data access options. This presentation is consistent with the idea of 'data as a residual', as well as data protection laws of the time which often characterised data simply as anonymous or not anonymous. A similar idea had already been developed independently in 2001 by Chuck Humphrey of the Canadian RDC network, the 'continuum of access'. More recently, The Open Data Institute has developed a 'Data Spectrum toolkit' which includes industry-specific examples. == History and terminology == The Five Safes was devised in the winter of 2002/2003 by Felix Ritchie at the UK Office for National Statistics (ONS) to describe its secure remote-access Virtual Microdata Laboratory (VML). It was described at this time as the 'VML Security Model'. This was adopted by the NORC data enclave, and more widely in the US, as the 'portfolio model' (although this is now also used to refer to a slightly different legal/statistical/educational breakdown). In 2012 the framework as was still being referred to as the 'VML security model', but its increasing use among non-UK organisations led to the adoption of the more general and informative phrase 'Five Safes'. The original framework only had four safes (projects, people, settings and outputs): the framework was used to describe highly detailed data access through a secure environment, and so the 'data' dimension was irrelevant. From 2007 onwards, 'safe data' was included as the framework was used to a describe a wider range of ONS activities. As the US version was based upon the 2005 specification, some US iterations uses have the original four dimensions (eg). Some discussions, such as the OECD, use the term 'secure' instead 'safe'. However, the use of both these terms can cause presentational problems: less control in a particular dimension could be seen to imply 'unsafe users' or 'insecure settings', for example, which distracts from the main message. Hence, the Australian government uses the term "five data sharing principles". The 'Anonymisation Decision-Making Framework' uses a framework based on the Five Safes but relabelling "projects", "people", and "settings" as "governance", "agency" and "infrastructure", respectively; "Output" is omitted, and "safe use" becomes "functional anonymisation". There is no reference to the Five Safes or any associated literature. The Australian version was required to include references to the Five Safes, and presented it as an alternative without comment. == Application == The framework has had three uses: pedagogical, descriptive, and design. Since 2016, it has also been used, directly and indirectly in legislation. See for more detailed examples. === Pedagogy === The first significant use of the framework, other than internal administrative use, was to structure researcher training courses at the UK Office for National Statistics from 2003. UK Data Archive, Administrative Data Research Network, Eurostat, Statistics New Zealand, the Mexican National Institute of Statistics and Geography, NORC, Statistics Canada and the Australian Bureau of Statistics, amongst others, have also used this framework. Most of these courses are for researchers using restricted-access facilities; the Eurostat courses are unusual in that they are designed for all users of sensitive data. === Description === The framework is often used to describe existing data access solutions (e.g. UK HMRC Data Lab, UK Data Service, Statistics New Zealand) or planned/conceptualised ones (e.g. Eurostat in 2011). An early use was to help identify areas where ONS' still had 'irreducible risks' in its provision of secure remote access. The framework is mostly used for confidential social science data. To date it appears to have made little impact on medical research planning, although it is now included in the revised guidelines on implementing HIPAA regulations in the US, and by Cancer Research UK and the Health Foundation in the UK. It has also been used to describe a security model for the Scottish Health Informatics Programme. === Design === In general the Five Safes has been used to describe solutions post-factum, and to explain/justify choices made, but an increasing number of organisations have used the framework to design data access solutions. For example, the Hellenic Statistical Agency developed a data strategy built around the Five Safes in 2016; the UK Health Foundation used the Five Safes to design its data management and training programmes. Use in the private sector is less common but some organisations have incorporated the Five Safes into consulting services. In 2015 the UK Data Service organized a workshop to encourage data users from the academic and private sectors to think about how to manage confidential research data, using the Five Safes to demonstrate alternative options and best practice. Early adopters for strategic design use were in Australia: both the Australian Bureau of Statistics and the Australian Department of Social Service used the Five Safes as an ex ante design tool. In 2017 the Australian Productivity Commission recommended adopting a version of the framework to support cross-government data sharing and re-use. This underwent extensive consultation and culminated in the DAT Act 2022. Since 2020 the Five Safes has been the overriding framework for the design of new secure facilities and data sharing arrangements in the UK for public health and social sciences. This has been promoted by the Office for Statistics Regulation, the UK Statistics Authority, NHS DIgital, and the research funding bodies Administrative Data Research UK and DARE UK. === Regulation and legislation === Three laws have incorporated the Fives Safes. They are explicit in the South Australian Public Sector (Data Sharing) Act 2016, and implicit in the research provisions of the UK Digital Economy Act 2017. The Australian Data Availability and Transparency Act 2022 renames the Five Safes as the Five Data Sharing Principles.A 2025 statutory review of the DAT Act 2022 found "that the DAT Act has not been effective in achieving its objectives.". The review includes specific referen

    Read more →
  • Geospatial metadata

    Geospatial metadata

    Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog (may also be known as a data directory or data inventory). == Definition == ISO 19115:2013 "Geographic Information – Metadata" from ISO/TC 211, the industry standard for geospatial metadata, describes its scope as follows: [This standard] provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services. ISO 19115:2013 also provides for non-digital mediums: Though this part of ISO 19115 is applicable to digital data and services, its principles can be extended to many other types of resources such as maps, charts, and textual documents as well as non-geographic data. The U.S. Federal Geographic Data Committee (FGDC) describes geospatial metadata as follows: A metadata record is a file of information, usually presented as an XML document, which captures the basic characteristics of a data or information resource. It represents the who, what, when, where, why and how of the resource. Geospatial metadata commonly document geographic digital data such as Geographic Information System (GIS) files, geospatial databases, and earth imagery but can also be used to document geospatial resources including data catalogs, mapping applications, data models and related websites. Metadata records include core library catalog elements such as Title, Abstract, and Publication Data; geographic elements such as Geographic Extent and Projection Information; and database elements such as Attribute Label Definitions and Attribute Domain Values. == History == The growing appreciation of the value of geospatial metadata through the 1980s and 1990s led to the development of a number of initiatives to collect metadata according to a variety of formats either within agencies, communities of practice, or countries/groups of countries. For example, NASA's "DIF" metadata format was developed during an Earth Science and Applications Data Systems Workshop in 1987, and formally approved for adoption in 1988. Similarly, the U.S. FGDC developed its geospatial metadata standard over the period 1992–1994. The Spatial Information Council of Australia and New Zealand (ANZLIC), a combined body representing spatial data interests in Australia and New Zealand, released version 1 of its "metadata guidelines" in 1996. ISO/TC 211 undertook the task of harmonizing the range of formal and de facto standards over the approximate period 1999–2002, resulting in the release of ISO 19115 "Geographic Information – Metadata" in 2003 and a subsequent revision in 2013. As of 2011 individual countries, communities of practice, agencies, etc. have started re-casting their previously used metadata standards as "profiles" or recommended subsets of ISO 19115, occasionally with the inclusion of additional metadata elements as formal extensions to the ISO standard. The growth in popularity of Internet technologies and data formats, such as Extensible Markup Language (XML) during the 1990s led to the development of mechanisms for exchanging geographic metadata on the web. In 2004, the Open Geospatial Consortium released the current version (3.1) of Geography Markup Language (GML), an XML grammar for expressing geospatial features and corresponding metadata. With the growth of the Semantic Web in the 2000s, the geospatial community has begun to develop ontologies for representing semantic geospatial metadata. Some examples include the Hydrology and Administrative ontologies developed by the Ordnance Survey in the United Kingdom. == ISO 19115: Geographic information – Metadata == ISO 19115 is a standard of the International Organization for Standardization (ISO). The standard is part of the ISO geographic information suite of standards (19100 series). ISO 19115 and its parts define how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use. The objective of this International Standard is to provide a clear procedure for the description of digital geographic data-sets so that users will be able to determine whether the data in a holding will be of use to them and how to access the data. By establishing a common set of metadata terminology, definitions and extension procedures, this standard promotes the proper use and effective retrieval of geographic data. ISO 19115 was revised in 2013 to accommodate growing use of the internet for metadata management, as well as add many new categories of metadata elements (referred to as codelists) and the ability to limit the extent of metadata use temporally or by user. == ISO 19139 Geographic information Metadata XML schema implementation == ISO 19139:2012 provides the XML implementation schema for ISO 19115 specifying the metadata record format and may be used to describe, validate, and exchange geospatial metadata prepared in XML. The standard is part of the ISO geographic information suite of standards (19100 series), and provides a spatial metadata XML (spatial metadata eXtensible Mark-up Language (smXML)) encoding, an XML schema implementation derived from ISO 19115, Geographic information – Metadata. The metadata includes information about the identification, constraint, extent, quality, spatial and temporal reference, distribution, lineage, and maintenance of the digital geographic data-set. == Metadata directories == Also known as metadata catalogues or data directories. (need discussion of, and subsections on GCMD, FGDC metadata gateway, ASDD, European and Canadian initiatives, etc. etc.) GIS Inventory – National GIS Inventory System which is maintained by the US-based National States Geographic Information Council (NSGIC) as a tool for the entire US GIS Community. Its primary purpose is to track data availability and the status of geographic information system (GIS) implementation in state and local governments to aid the planning and building of statewide spatial data infrastructures (SSDI). The Random Access Metadata for Online Nationwide Assessment (RAMONA) database is a critical component of the GIS Inventory. RAMONA moves its FGDC-compliant metadata (CSDGM Standard) for each data layer to a web folder and a Catalog Service for the Web (CSW) that can be harvested by Federal programs and others. This provides far greater opportunities for discovery of user information. The GIS Inventory website was originally created in 2006 by NSGIC under award NA04NOS4730011 from the Coastal Services Center, National Oceanic and Atmospheric Administration, U.S. Department of Commerce. The Department of Homeland Security has been the principal funding source since 2008 and they supported the development of the Version 5 during 2011/2012 under Order Number HSHQDC-11-P-00177. The Federal Emergency Management Agency and National Oceanic and Atmospheric Administration have provided additional resources to maintain and improve the GIS Inventory. Some US Federal programs require submission of CSDGM-Compliant Metadata for data created under grants and contracts that they issue. The GIS Inventory provides a very simple interface to create the required Metadata. GCMD - Global Change Master Directory's goal is to enable users to locate and obtain access to Earth science data sets and services relevant to global change and Earth science research. The GCMD database holds more than 20,000 descriptions of Earth science data sets and services covering all aspects of Earth and environmental sciences. ECHO - The EOS Clearing House (ECHO) is a spatial and temporal metadata registry, service registry, and order broker. It allows users to more efficiently search and access data and services through the Reverb Client or Application Programmer Interfaces (APIs). ECHO stores metadata from a variety of science disciplines and domains, totalling over 3400 Earth science data sets and over 118 million granule records. GoGeo - GoGeo is a service run by EDINA (University of Edinburgh) and is supported by Jisc. GoGeo allows users to conduct geographically targeted searches to discover geospatial datasets. GoGeo searches many data portals from the HE and FE community and beyond. GoGeo also allows users to create standards compliant metadata through its Geodoc metadata editor. == Geospatial metadata tools == There are many proprietary GIS or geospatial products that support metadata viewing and editing on GIS resources. For example, ESRI's ArcGIS Desktop, SOCET GXP, Autodesk's AutoCAD Map 3D 2008, Arcitecta's Mediaflux and Intergraph's Geo

    Read more →
  • Ubiquitous computing

    Ubiquitous computing

    Ubiquitous computing (or "ubicomp") is a concept in software engineering, hardware engineering and computer science where computing is made to appear seamlessly anytime and everywhere. In contrast to desktop computing, ubiquitous computing implies use on any device, in any location, and in any format. A user interacts with the computer, which can exist in many different forms, including laptop computers, tablets, smart phones and terminals in everyday objects such as a refrigerator or a pair of glasses. The underlying technologies to support ubiquitous computing include the Internet, advanced middleware, kernels, operating systems, mobile codes, sensors, microprocessors, new I/Os and user interfaces, computer networks, mobile protocols, global navigational systems, and new materials. This paradigm is also described as pervasive computing, ambient intelligence, or "everyware". Each term emphasizes slightly different aspects. When primarily concerning the objects involved, it is also known as physical computing, the Internet of Things, haptic computing, and "things that think". Rather than propose a single definition for ubiquitous computing and for these related terms, a taxonomy of properties for ubiquitous computing has been proposed, from which different kinds or flavors of ubiquitous systems and applications can be described. Ubiquitous computing themes include: distributed computing, mobile computing, location computing, mobile networking, sensor networks, human–computer interaction, context-aware smart home technologies, and artificial intelligence. == Core concepts == Ubiquitous computing is the concept of using small internet connected and inexpensive computers to help with everyday functions in an automated fashion. Mark Weiser proposed three basic forms for ubiquitous computing devices: Tabs: a wearable device that is approximately a centimeter in size Pads: a hand-held device that is approximately a decimeter in size Boards: an interactive larger display device that is approximately a meter in size Ubiquitous computing devices proposed by Mark Weiser are all based around flat devices of different sizes with a visual display. These conceptual device categories were later implemented at Xerox PARC in experimental systems including the PARCTab, PARCPad, and LiveBoard, which served as early prototypes of handheld, tablet-style, and large interactive display computing environments. Expanding beyond those concepts there is a large array of other ubiquitous computing devices that could exist. == History == Mark Weiser coined the phrase "ubiquitous computing" around 1988, during his tenure as Chief Technologist of the Xerox Palo Alto Research Center (PARC). Both alone and with PARC Director and Chief Scientist John Seely Brown, Weiser wrote some of the earliest papers on the subject, largely defining it and sketching out its major concerns. == Recognizing the effects of extending processing power == Recognizing that the extension of processing power into everyday scenarios would necessitate understandings of social, cultural and psychological phenomena beyond its proper ambit, Weiser was influenced by many fields outside computer science, including "philosophy, phenomenology, anthropology, psychology, post-Modernism, sociology of science and feminist criticism". He was explicit about "the humanistic origins of the 'invisible ideal in post-modernist thought'", referencing as well the ironically dystopian Philip K. Dick novel Ubik. Andy Hopper from Cambridge University UK proposed and demonstrated the concept of "Teleporting" – where applications follow the user wherever he/she moves. Roy Want (now at Google), while at Olivetti Research Ltd, designed the first "Active Badge System", which is an advanced location computing system where personal mobility is merged with computing. Later at Xerox PARC, he designed and built the "PARCTab" or simply "Tab", widely recognized as the world's first Context-Aware computer, which has great similarity to the modern smartphone. Bill Schilit (now at Google) also did some earlier work in this topic, and participated in the early Mobile Computing workshop held in Santa Cruz in 1996. Ken Sakamura of the University of Tokyo, Japan leads the Ubiquitous Networking Laboratory (UNL), Tokyo as well as the T-Engine Forum. The joint goal of Sakamura's Ubiquitous Networking specification and the T-Engine forum, is to enable any everyday device to broadcast and receive information. MIT has also contributed significant research in this field, notably Things That Think consortium (directed by Hiroshi Ishii, Joseph A. Paradiso and Rosalind Picard) at the Media Lab and the CSAIL effort known as Project Oxygen. Other major contributors include University of Washington (Shwetak Patel, Anind Dey and James Landay), Dartmouth College's HealthX Lab (directed by Andrew Campbell), Georgia Tech's College of Computing (Gregory Abowd and Thad Starner), Cornell Tech's People Aware Computing Lab (directed by Tanzeem Choudhury), NYU's Interactive Telecommunications Program, UC Irvine's Department of Informatics, Microsoft Research, Intel Research and Equator, Ajou University UCRi & CUS. == Examples == One of the earliest ubiquitous systems was artist Natalie Jeremijenko's "Live Wire", also known as "Dangling String", installed at Xerox PARC during Mark Weiser's time there. This was a piece of string attached to a stepper motor and controlled by a LAN connection; network activity caused the string to twitch, yielding a peripherally noticeable indication of traffic. Weiser called this an example of calm technology. A present manifestation of this trend is the widespread diffusion of mobile phones. Many mobile phones support high speed data transmission, video services, and other services with powerful computational ability. Although these mobile devices are not necessarily manifestations of ubiquitous computing, there are examples, such as Japan's Yaoyorozu ("Eight Million Gods") Project in which mobile devices, coupled with radio frequency identification tags demonstrate that ubiquitous computing is already present in some form. Ambient Devices has produced an "orb", a "dashboard", and a "weather beacon": these decorative devices receive data from a wireless network and report current events, such as stock prices and the weather, like the Nabaztag, which was invented by Rafi Haladjian and Olivier Mével, and manufactured by the company Violet. The Australian futurist Mark Pesce has produced a highly configurable 52-LED LAMP enabled lamp which uses Wi-Fi named MooresCloud after Gordon Moore. The Unified Computer Intelligence Corporation launched a device called Ubi – The Ubiquitous Computer designed to allow voice interaction with the home and provide constant access to information. Ubiquitous computing research has focused on building an environment in which computers allow humans to focus attention on select aspects of the environment and operate in supervisory and policy-making roles. Ubiquitous computing emphasizes the creation of a human computer interface that can interpret and support a user's intentions. For example, MIT's Project Oxygen seeks to create a system in which computation is as pervasive as air: In the future, computation will be human centered. It will be freely available everywhere, like batteries and power sockets, or oxygen in the air we breathe...We will not need to carry our own devices around with us. Instead, configurable generic devices, either handheld or embedded in the environment, will bring computation to us, whenever we need it and wherever we might be. As we interact with these "anonymous" devices, they will adopt our information personalities. They will respect our desires for privacy and security. We won't have to type, click, or learn new computer jargon. Instead, we'll communicate naturally, using speech and gestures that describe our intent... This is a fundamental transition that does not seek to escape the physical world and "enter some metallic, gigabyte-infested cyberspace" but rather brings computers and communications to us, making them "synonymous with the useful tasks they perform". Network robots link ubiquitous networks with robots, contributing to the creation of new lifestyles and solutions to address a variety of social problems including the aging of population and nursing care. The "Continuity" set of features, introduced by Apple in OS X Yosemite, can be seen as an example of ubiquitous computing. == Issues == Privacy is easily the most often-cited criticism of ubiquitous computing (ubicomp), and may be the greatest barrier to its long-term success. == Research centres == This is a list of notable institutions who claim to have a focus on Ubiquitous computing sorted by country: Canada Topological Media Lab, Concordia University, Canada Finland Community Imaging Group, University of Oulu, Finland Germany Telecooperation Office (TECO), Karlsruhe Institute of Technology, Ger

    Read more →
  • Anthem medical data breach

    Anthem medical data breach

    The Anthem medical data breach was a medical data breach of information held by Elevance Health, known at that time as Anthem Inc. On February 4, 2015, Anthem, Inc. disclosed that criminal hackers had broken into its servers and had potentially stolen over 37.5 million records that contain personally identifiable information from its servers. On February 24, 2015 Anthem raised the number to 78.8 million people whose personal information had been affected. According to Anthem, Inc., the data breach extended into multiple brands Anthem, Inc. uses to market its healthcare plans, including, Anthem Blue Cross, Anthem Blue Cross and Blue Shield, Blue Cross and Blue Shield of Georgia, Empire Blue Cross and Blue Shield, Amerigroup, Caremore, and UniCare. Healthlink says that it was also a victim. Anthem says users' medical information and financial data were not compromised. Anthem has offered free credit monitoring in the wake of the breach. Michael Daniel, chief adviser on cybersecurity for President Barack Obama, said he would be changing his own password. According to The New York Times, about 80 million company records were hacked, and there is a fear that the stolen data will be used for identity theft. The compromised information contained names, birthdays, medical IDs, social security numbers, street addresses, e-mail addresses and employment information, including income data. == Theft of the data == The data was stolen over a period of weeks the month before the data breach was discovered. Because no medical information was compromised, Anthem was not required by law to encrypt the data. However, Anthem faced several civil class-action lawsuits, which were settled in 2017 at a cost of $115 million. Anthem did not admit any wrongdoing in the settlement. Data from the attack is expected to be sold on the black market. == Impact == Persons whose data was stolen could have resulting problems about identity theft for the rest of their lives. Anthem had a US$100 million insurance policy for cyber problems from American International Group. One report suggested that all of this money could be consumed by the process of notifying customers of the breach. == Responses == Anthem hired Mandiant, a cybersecurity firm, to review their security systems and advised people whose data was stolen to monitor their accounts and remain vigilant. The theft of the data raised fears generally about the theft of medical information. A writer from Harvard Law School suggested that this data breach might spark reform of security practices and government data safety regulation. An investigation conducted by several state insurance commissioners blames the breach on an attacker whose identity was withheld, and claims that the breach was likely ordered by a foreign government whose name was withheld. It also concluded that Anthem had taken reasonable measures to protect its data before the breach and that its remediation plan was effective at shutting down the breach once it was discovered. It also marks the starting date of the breach as February 18, 2014. The lead investigator was the Indiana Department of Insurance (DOI) -- Anthem's principal regulator, because Anthem is headquartered in Indiana. The Indiana DOI hired independent auditors to conduct a security assessment at Anthem, which concluded, "While deficiencies within Anthem’s cybersecurity posture were noted by the Examination Team, these deficiencies were not, in our experience, uncommon to companies comparable to Anthem in size and scope. While the pre-breach deficiencies impacted Anthem’s ability to reduce the likelihood of and quickly detect the Data Breach, the controls implemented subsequent to the Data Breach should improve Anthem’s ability to detect future breaches and enable Anthem to respond more effectively to a future attack than was the case in this instance." Federal regulators also conducted an investigation of the Anthem data breach, resulting in a $16 million settlement between Anthem and the Department of Health and Human Services (HHS) -- by far the largest HHS data breach settlement. An HHS Director overseeing the investigation said, "The largest health data breach in U.S. history fully merits the largest HIPAA settlement in history. Unfortunately, Anthem failed to implement appropriate measures for detecting hackers who had gained access to their system to harvest passwords and steal people's private information." The HHS settlement also required Anthem to perform a risk assessment and correct any identified deficiencies in its cybersecurity, with HHS oversight of Anthem's progress. Approximately 100 private class action lawsuits were filed against Anthem over the data breach and consolidated in California federal court, in front of Judge Koh, a respected authority in data breach litigation. After contested briefing over who should lead the litigation efforts, Judge Koh appoints Eve Cervantez of Altshuler Berzon and Andy Friedman of Cohen Milstein as co-lead counsel, and appointed Eric Gibbs of Gibbs Law Group and Michael Sobel of Lieff Cabraser to head a Plaintiffs' Steering Committee. In 2017, Anthem agreed to settle the litigation for $115 million, the largest ever data breach settlement at the time. The attorneys requested $38 million in fees for their work on the case, but Judge Koh slashed the fee request, finding that only $31 million in fees were merited.

    Read more →
  • Information pollution

    Information pollution

    Information pollution (also referred to as info pollution) is the contamination of an information supply with irrelevant, redundant, unsolicited, hampering, and low-value information. Examples include misinformation, disinformation, junk e-mail, and media violence. The spread of useless and undesirable information can have a detrimental effect on human activities. It is considered to be an adverse effect of the information revolution. == Overview == Information pollution generally applies to digital communication, such as e-mail, instant messaging (IM), and social media. The term acquired particular relevance in 2003 when web usability expert Jakob Nielsen published articles discussing the topic. As early as 1971 researchers were expressing doubts about the negative effects of having to recover "valuable nodules from a slurry of garbage in which it is a randomly dispersed minor component." People use information in order to make decisions and adapt to circumstances. Cognitive studies demonstrated human beings can process only limited information before the quality of their decisions begins to deteriorate. Information overload is a related concept that can also harm decision-making. It refers to an abundance of available information, without respect to its quality. Although technology is thought to have exacerbated the problem, it is not the only cause of information pollution. Anything that distracts attention from the essential facts required to perform a task or make a decision could be considered an information pollutant. Information pollution is seen as the digital equivalent of the environmental pollution generated by industrial processes. Some authors claim that information overload is a crisis of global proportions, on the same scale as threats faced by environmental destruction. Others have expressed the need for the development of an information management paradigm that parallels environmental management practices. == Manifestations == The manifestations of information pollution can be classified into two groups: those that provoke disruption, and those that damage information quality. Typical examples of disrupting information pollutants include unsolicited electronic messages (spam) and instant messages, particularly in the workplace. Mobile phones (ring tones and content) are disruptive in many contexts. Disrupting information pollution is not always technology based. A common example are newspapers, where subscribers read less than half or even none of the articles provided. Superfluous messages, such as unnecessary labels on a map, also distract. Alternatively, information may be polluted when its quality is reduced. This may be due to inaccurate or outdated information, but it also happens when information is badly presented. For example, when content is unfocused or unclear or when they appear in cluttered, wordy, or poorly organised documents it is difficult for the reader to understand. Laws and regulations undergo changes and revisions. Handbooks and other sources used for interpreting these laws can fall years behind the changes, which can cause the public to be misinformed. == Causes == === Cultural factors === Traditionally, information has been seen positively. People are accustomed to statements like "you cannot have too much information", "the more information the better", and "knowledge is power". The publishing and marketing industries have become used to printing many copies of books, magazines, and brochures regardless of customer demand, just in case they are needed. Democratised information sharing is an example of a new technology that has made it easier for information to reach everyone. Such technologies are perceived as a sign of progress and individual empowerment, as well as a positive step to bridge the digital divide. However, they also increase the volume of distracting information, making it more difficult to distinguish valuable information from noise. The continuous use of advertising in websites, technologies, newspapers, and everyday life is known as "cultural pollution". === Information technology === Technological advances of the 20th century and, in particular, the internet play a key role in the increase of information pollution. Blogs, social networks, personal websites, and mobile technology all contribute to increased "noise". The level of pollution may depend on the context. For example, e-mail is likely to cause more information pollution in a corporate setting, whereas mobile phones are likely to be particularly disruptive in a confined space shared by multiple people, such as a train carriage. == Effects == The effects of information pollution can be seen at multiple levels. === Individual === At a personal level, information pollution affects individuals' capacity to evaluate options and find adequate solutions. This can lead to information overload, anxiety, decision paralysis, and stress. It can disrupt the learning process. === Society === Some authors argue that information pollution and information overload can cause loss of perspective and moral values. This argument may explain the indifferent attitude that society shows toward topics such as scientific discoveries, health warnings, or politics. Pollution makes people less sensitive to headlines and more cynical toward new messages. === Business === Information pollution contributes to information overload and stress, which can disrupt the kinds information processing and decision-making needed to complete tasks at work. This leads to delayed or flawed decisions, which can translate into loss of productivity and revenue as well as an increased risk of critical errors. == Solutions == Proposed solutions include management techniques and refined technology. Technology-based alternatives include decision support systems and dashboards that enable prioritisation of information. Technologies that create frequent interruptions can be replaced with less-"polluting" options. Further, technology can improve the presentation quality, aiding understanding. E-mail usage policies and information integrity assurance strategies can help. Time management and stress management can be applied; these solutions would involve setting priorities and minimising interruptions. Improved writing and presentation practices can minimise information pollution effects on others. == Related terms == The term infollution or informatization pollution was coined by Dr. Paek-Jae Cho, former president & CEO of KTC (Korean Telecommunication Corp.), in a 2002 speech at the International Telecommunications Society (ITS) 14th biennial conference to describe any undesirable side effect brought about by information technology and its applications.

    Read more →
  • Conceptions of Library and Information Science

    Conceptions of Library and Information Science

    Conceptions of Library and Information Science (CoLIS) is a series of conferences about historical, empirical and theoretical perspectives in Library and Information Science. == CoLIS conferences == CoLIS 1 1991 in Tampere, Finland CoLIS 2 1996 in Copenhagen, Denmark CoLIS 3 1999 in Dubrovnik, Croatia CoLIS 4 2002 in Seattle, US CoLIS 5 2005 in Glasgow, Scotland CoLIS 6 2007 in Borås, Sweden CoLIS 7 June 2010 in London, at City University London. CoLIS 8 August 19–22, 2013, in Copenhagen, Denmark, at The Royal School of Library and Information Science. CoLIS 9 June 27–29, 2016, in Uppsala, Sweden, at Uppsala University. CoLIS 10 June 16–19, 2019, in Ljubljana, Slovenia, Faculty of Arts CoLIS 11 May 29–June 1, 2022, in Oslo, Norway, Oslo Metropolitan University.

    Read more →
  • Materials informatics

    Materials informatics

    Materials informatics is a field of study that applies the principles of informatics and data science to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. The term "materials informatics" is frequently used interchangeably with "data science", "machine learning", and "artificial intelligence" by the community. This is an emerging field, with a goal to achieve high-speed and robust acquisition, management, analysis, and dissemination of diverse materials data with the goal of greatly reducing the time and risk required to develop, produce, and deploy new materials, which generally takes longer than 20 years. This field of endeavor is not limited to some traditional understandings of the relationship between materials and information. Some more narrow interpretations include combinatorial chemistry, process modeling, materials databases, materials data management, and product life cycle management. Materials informatics is at the convergence of these concepts, but also transcends them and has the potential to achieve greater insights and deeper understanding by applying lessons learned from data gathered on one type of material to others. By gathering appropriate meta data, the value of each individual data point can be greatly expanded. == Databases == Databases are essential for any informatics research and applications. In material informatics many databases exist containing both empirical data obtained experimentally, and theoretical data obtained computationally. Big data that can be used for machine learning is particularly difficult to obtain for experimental data due to the lack of a standard for reporting data and the variability in the experimental environment. This lack of big data has led to growing effort in developing machine learning techniques that utilize data extremely data sets. On the other hand, large uniform database of theoretical density functional theory (DFT) calculations exists. These databases have proven their utility in high-throughput material screening and discovery. Some common DFT databases and high throughput tools are listed below: Databases: MaterialsProject.org, MaterialsWeb.org (University of Florida) HT software: Pymatgen, MPInterfaces, Matminer == Beyond computational methods? == The concept of materials informatics is addressed by the Materials Research Society. For example, materials informatics was the theme of the December 2006 issue of the MRS Bulletin. The issue was guest-edited by John Rodgers of Innovative Materials, Inc., and David Cebon of Cambridge University, who described the "high payoff for developing methodologies that will accelerate the insertion of materials, thereby saving millions of investment dollars." The editors focused on the limited definition of materials informatics as primarily focused on computational methods to process and interpret data. They stated that "specialized informatics tools for data capture, management, analysis, and dissemination" and "advances in computing power, coupled with computational modeling and simulation and materials properties databases" will enable such accelerated insertion of materials. A broader definition of materials informatics goes beyond the use of computational methods to carry out the same experimentation, viewing materials informatics as a framework in which a measurement or computation is one step in an information-based learning process that uses the power of a collective to achieve greater efficiency in exploration. When properly organized, this framework crosses materials boundaries to uncover fundamental knowledge of the basis of physical, mechanical, and engineering properties. == Challenges == While there are many who believe in the future of informatics in the materials development and scaling process, many challenges remain. Hill, et al., write that "Today, the materials community faces serious challenges to bringing about this data-accelerated research paradigm, including diversity of research areas within materials, lack of data standards, and missing incentives for sharing, among others. Nonetheless, the landscape is rapidly changing in ways that should benefit the entire materials research enterprise." This remaining tension between traditional materials development methodologies and the use of more computationally, machine learning, and analytics approaches will likely exist for some time as the materials industry overcomes some of the cultural barriers necessary to fully embrace such new ways of thinking. == Analogy from Biology == The overarching goals of bioinformatics and systems biology may provide a useful analogy. Andrew Murray of Harvard University expresses the hope that such an approach "will save us from the era of "one graduate student, one gene, one PhD". Similarly, the goal of materials informatics is to save us from one graduate student, one alloy, one PhD. Such goals will require more sophisticated strategies and research paradigms than applying data-science methods to the same tasks set currently undertaken by students.

    Read more →
  • Wargame (hacking)

    Wargame (hacking)

    In hacking, a wargame (or war game) is a cyber-security challenge and mind sport in which the competitors must exploit or defend a vulnerability in a system or application, and/or gain or prevent access to a computer system. A wargame usually involves a capture the flag logic, based on pentesting, semantic URL attacks, knowledge-based authentication, password cracking, reverse engineering of software (often JavaScript, C and assembly language), code injection, SQL injections, cross-site scripting, exploits, IP address spoofing, forensics, and other hacking techniques. == Wargames for preparedness == Wargames are also used as a method of cyberwarfare preparedness. The NATO Cooperative Cyber Defence Centre of Excellence (CCDCOE) organizes an annual event, Locked Shields, which is an international live-fire cyber exercise. The exercise challenges cyber security experts through real-time attacks in fictional scenarios and is used to develop skills in national IT defense strategies. == Additional applications == Wargames can be used to teach the basics of web attacks and web security, giving participants a better understanding of how attackers exploit security vulnerabilities. Wargames are also used as a way to "stress test" an organization's response plan and serve as a drill to identify gaps in cyber disaster preparedness.

    Read more →
  • Hall circles

    Hall circles

    Hall circles (also known as M-circles and N-circles) are a graphical tool in control theory used to obtain values of a closed-loop transfer function from the Nyquist plot (or the Nichols plot) of the associated open-loop transfer function. Hall circles have been introduced in control theory by Albert C. Hall in his thesis. == Construction == Consider a closed-loop linear control system with open-loop transfer function given by transfer function G ( s ) {\displaystyle G(s)} and with a unit gain in the feedback loop. The closed-loop transfer function is given by T ( s ) = G ( s ) 1 + G ( s ) {\textstyle T(s)={\frac {G(s)}{1+G(s)}}} . To check the stability of T(s), it is possible to use the Nyquist stability criterion with the Nyquist plot of the open-loop transfer function G(s). Note, however, that the Nyquist plot of G(s) does not give the actual values of T(s). To get this information from the G(s)-plane, Hall proposed to construct the locus of points in the G(s)-plane such that T(s) has constant magnitude and also the locus of points in the G(s)-plane such that T(s) has constant phase angle. Given a positive real value M representing a fixed magnitude, and denoting G(s) by z, the points satisfying M = | T ( s ) | = | G ( s ) | | 1 + G ( s ) | = | z | | 1 + z | {\displaystyle M=|T(s)|={\frac {|G(s)|}{|1+G(s)|}}={\frac {|z|}{|1+z|}}} are given by the points z in the G(s)-plane such that the ratio of the distance between z and 0 and the distance between z and -1 is equal to M. The points z satisfying this locus condition are circles of Apollonius, and this locus is known in the context of control systems as M-circles. Given a positive real value N representing a phase angle, the points satisfying N = arg ⁡ [ G ( s ) 1 + G ( s ) ] = arg ⁡ [ G ( s ) ] − arg ⁡ [ 1 + G ( s ) ] = arg ⁡ [ z ] − arg ⁡ [ 1 + z ] {\displaystyle N=\arg \left[{\frac {G(s)}{1+G(s)}}\right]=\arg[G(s)]-\arg[1+G(s)]=\arg[z]-\arg[1+z]} are given by the points z in the G(s)-plane such that the angle between -1 and z and the angle between 0 and z is constant. In other words, the angle opposed to the line segment between -1 and 0 must be constant. This implies that the points z satisfying this locus condition are arcs of circles, and this locus is known in the context of control systems as N-circles. == Usage == To use the Hall circles, a plot of M and N circles is done over the Nyquist plot of the open-loop transfer function. The points of the intersection between these graphics give the corresponding value of the closed-loop transfer function. Hall circles are also used with the Nichols plot and in this setting, are also known as Nichols chart. Rather than overlaying directly the Hall circles over the Nichols plot, the points of the circles are transferred to a new coordinate system where the ordinate is given by 20 log 10 ⁡ ( | G ( s ) | ) {\displaystyle 20\log _{10}(|G(s)|)} and the abscissa is given by arg ⁡ ( G ( s ) ) {\displaystyle \arg(G(s))} . The advantage of using Nichols chart is that adjusting the gain of the open loop transfer function directly reflects in up and down translation of the Nichols plot in the chart.

    Read more →
  • Master data

    Master data

    Master data represents "data about the business entities that provide context for business transactions". The most commonly found categories of master data are parties (individuals and organisations, and their roles, such as customers, suppliers, employees), products, financial structures (such as ledgers and cost centres) and locational concepts. Master data should be distinguished from reference data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. Master data is, by its nature, almost always non-transactional in nature. There exist edge cases where an organization may need to treat certain transactional processes and operations as "master data". This arises, for example, where information about master data entities, such as customers or products, is only contained within transactional data such as orders and receipts and is not housed separately. ISO 8000 is the international standard for data quality and data portability in master data. == Alternative definition == An alternative definition of the term master data is that it represents the business objects that contain the most valuable, agreed upon information shared across an organization. In this sense, it gives context to business activities and transactions, answering questions like who, what, when and how as well as expanding the ability to make sense of these activities through categorizations, groupings and hierarchies. It can cover relatively static reference data, transactional, unstructured, analytical, hierarchical and metadata. What constitutes master data under this definition is therefore not about an essential quality of the data (e.g. it is a business entity that provides context for business transactions), but rather about the context in which the organisation has decided to treat the data. == Externally-defined master data == For most organisations, most or all master data is defined and managed within that organisation. Some master data, however, may be externally defined and managed. This represents the single source of basic business data used across a marketplace, regardless of organisation or location. Thus, it can be used by multiple enterprises within a value chain, facilitating "integration of multiple data sources and literally [putting] everyone in the market on the same page." An example of market master data is the Universal Product Code (UPC) found on consumer products. == Master data management == Curating and managing master data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's master data. Master Data is therefore the focus of the information technology (IT) discipline of master data management (MDM). Without this discipline in place, organisations commonly encounter difficulties with having multiple versions of "the truth" about a business entity, both within individual applications, and distributed across applications.

    Read more →
  • Algorithmic transparency

    Algorithmic transparency

    Algorithmic transparency is the principle that the factors that influence the decisions made by algorithms should be visible, or transparent, to the people who use, regulate, and are affected by systems that employ those algorithms. Although the phrase was coined in 2016 by Nicholas Diakopoulos and Michael Koliska about the role of algorithms in deciding the content of digital journalism services, the underlying principle dates back to the 1970s and the rise of automated systems for scoring consumer credit. The phrases "algorithmic transparency" and "algorithmic accountability" are sometimes used interchangeably – especially since they were coined by the same people – but they have subtly different meanings. Specifically, "algorithmic transparency" states that the inputs to the algorithm and the algorithm's use itself must be known, but they need not be fair. "Algorithmic accountability" implies that the organizations that use algorithms must be accountable for the decisions made by those algorithms, even though the decisions are being made by a machine, and not by a human being. Current research around algorithmic transparency interested in both societal effects of accessing remote services running algorithms, as well as mathematical and computer science approaches that can be used to achieve algorithmic transparency. In the United States, the Federal Trade Commission's Bureau of Consumer Protection studies how algorithms are used by consumers by conducting its own research on algorithmic transparency and by funding external research. In the European Union, the data protection laws that came into effect in May 2018 include a "right to explanation" of decisions made by algorithms, though it is unclear what this means. Furthermore, the European Union founded The European Center for Algorithmic Transparency (ECAT).

    Read more →
  • Sample (graphics)

    Sample (graphics)

    In computer graphics, a sample is an intersection of a channel and a pixel. The diagram below depicts a 24-bit pixel, consisting of 3 samples for Red, Green, and Blue. In this particular diagram, the Red sample occupies 9 bits, the Green sample occupies 7 bits and the Blue sample occupies 8 bits, totaling 24 bits per pixel. Note that the samples do not have to be equal size and not all samples are mandatory in a pixel. Also, a pixel can consist of more than 3 samples (e.g. 4 samples of the RGBA color space). A sample is related to a subpixel on a physical display.

    Read more →
  • Authoritative Legal Entity Identifier

    Authoritative Legal Entity Identifier

    An Authoritative Legal Entity Identifier (ALEI) is the identifier assigned by a government jurisdiction authorized by statute or decree to create a legal entity and to maintain the authoritative registries of legal entities. ALEIs are used within supply chain data, ERP applications and master data management systems to support accurate and consistent identification of entities in digital records, supply chains, and government databases. ALEIs are described in the international standard ISO 8000-116, which outlines a structured format that makes the locally unique identifier into a globally unique one and ensures global interoperability and data quality. == Structure == An ALEI is composed of three main components: a prefix that identifies the jurisdiction and register, a subdomain element (optional), and the local registration number of the entity. For example, the identifier "US-DE.BER:3031657" refers to an entity registered in the Delaware Business Entity Register in the United States. The standardization of this structure is governed by ISO 8000-116, which is designed to ensure each ALEI is globally unique and resolvable. == Comparison with other identifiers == ALEIs differ from proxy identifiers such as the DUNS number, NCAGE code, or the Legal Entity Identifier (LEI) managed by GLEIF. While proxy identifiers can be issued by institutions that do not create legal entities, ALEIs are created and maintained by public bodies with the authority to form and register legal entities. This authoritative origin makes ALEIs particularly suitable for applications involving legal traceability, government regulation, and international transparency efforts. == Usage == ALEIs are increasingly utilized to identify legal entities in public and private datasets. The identifiers support supply chain accuracy, regulatory compliance, and the unification of master data. The first practical implementation of an ALEI was the International Business Registration Number (IBRN), developed to provide globally unique identifiers for registered business entities. IBRNs are issued by authorized government jurisdictions and are used to verify entities across borders, particularly in the context of trade facilitation and data exchange systems. For instance, business directories and registration systems in U.S. states like Connecticut provide structured registration documents that can be used to verify the ALEIs they issue. The use of ALEIs has been recommended by international organizations such as the Extractive Industries Transparency Initiative (EITI) and Open ownership to improve beneficial ownership registries. == Policy and regulation == ALEIs have been referenced in policy consultations such as those related to the U.S. Financial Data Transparency Act. Federal institutions including the Federal Reserve and FDIC have examined the potential for ALEIs to unify entity identification across regulatory databases.

    Read more →
  • Chinese speech synthesis

    Chinese speech synthesis

    Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes. == Concatenation (Ekho and KeyTip) == Recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; these synthesizers are also inflexible in terms of speed and expression. However, because these synthesizers do not rely on a corpus, there is no noticeable degradation in performance when they are given more unusual or awkward phrases. Ekho is an open source TTS which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and experimentally Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials". cjkware.com used to ship a product called KeyTip Putonghua Reader which worked similarly; it contained 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). == Lightweight synthesizers (eSpeak and Yuet) == The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has experimented with Mandarin and Cantonese. eSpeak was used by Google Translate from May 2010 until December 2010. The commercial product "Yuet" is also lightweight (it is intended to be suitable for resource-constrained environments like embedded systems); it was written from scratch in ANSI C starting from 2013. Yuet claims a built-in NLP model that does not require a separate dictionary; the speech synthesised by the engine claims clear word boundaries and emphasis on appropriate words. Communication with its author is required to obtain a copy. Both eSpeak and Yuet can synthesis speech for Cantonese and Mandarin from the same input text, and can output the corresponding romanisation (for Cantonese, Yuet uses Yale and eSpeak uses Jyutping; both use Pinyin for Mandarin). eSpeak does not concern itself with word boundaries when these don't change the question of which syllable should be spoken. == Corpus-based == A "corpus-based" approach can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The synthesiser engine is typically very large (hundreds or even thousands of megabytes) due to the size of the corpus. === iFlyTek === Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average". The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work. === NeoSpeech === There is an online interactive demonstration for NeoSpeech speech synthesis, which accepts Chinese characters and also pinyin if it's enclosed in their proprietary "VTML" markup. === Mac OS === Mac OS had Chinese speech synthesizers available up to version 9. This was removed in 10.0 and reinstated in 10.7 (Lion). === Historical corpus-based synthesizers (no longer available) === A corpus-based approach was taken by Tsinghua University in SinoSonic, with the Harbin dialect voice data taking 800 Megabytes. This was planned to be offered as a download but the link was never activated. Nowadays, only references to it can be found on Internet Archive. Bell Labs' approach, which was demonstrated online in 1997 but subsequently removed, was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0-7923-8027-6), and the former employee who was responsible for the project, Chilin Shih (who subsequently worked at the University of Illinois) put some notes about her methods on her website.

    Read more →