AI Art Legality

AI Art Legality — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Concurrency control

    Concurrency control

    In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. Computer systems, both software and hardware, consist of modules, or components. Each component is designed to operate correctly, i.e., to obey or to meet certain consistency rules. When components that operate concurrently interact by messaging or by sharing accessed data (in memory or storage), a certain component's consistency may be violated by another component. The general area of concurrency control provides rules, methods, design methodologies, and theories to maintain the consistency of components operating concurrently while interacting, and thus the consistency and correctness of the whole system. Introducing concurrency control into a system means applying operation constraints which typically result in some performance reduction. Operation consistency and correctness should be achieved with as good as possible efficiency, without reducing performance below reasonable levels. Concurrency control can require significant additional complexity and overhead in a concurrent algorithm compared to the simpler sequential algorithm. For example, a failure in concurrency control can result in data corruption from torn read or write operations. == Concurrency control in databases == Comments: This section is applicable to all transactional systems, i.e., to all systems that use database transactions (atomic transactions; e.g., transactional objects in Systems management and in networks of smartphones which typically implement private, dedicated database systems), not only general-purpose database management systems (DBMSs). DBMSs need to deal also with concurrency control issues not typical just to database transactions but rather to operating systems in general. These issues (e.g., see Concurrency control in operating systems below) are out of the scope of this section. Concurrency control in Database management systems (DBMS; e.g., Bernstein et al. 1987, Weikum and Vossen 2001), other transactional objects, and related distributed applications (e.g., Grid computing and Cloud computing) ensures that database transactions are performed concurrently without violating the data integrity of the respective databases. Thus concurrency control is an essential element for correctness in any system where two database transactions or more, executed with time overlap, can access the same data, e.g., virtually in any general-purpose database system. Consequently, a vast body of related research has been accumulated since database systems emerged in the early 1970s. A well established concurrency control theory for database systems is outlined in the references mentioned above: serializability theory, which allows to effectively design and analyze concurrency control methods and mechanisms. An alternative theory for concurrency control of atomic transactions over abstract data types is presented in (Lynch et al. 1993), and not utilized below. This theory is more refined, complex, with a wider scope, and has been less utilized in the Database literature than the classical theory above. Each theory has its pros and cons, emphasis and insight. To some extent they are complementary, and their merging may be useful. To ensure correctness, a DBMS usually guarantees that only serializable transaction schedules are generated, unless serializability is intentionally relaxed to increase performance, but only in cases where application correctness is not harmed. For maintaining correctness in cases of failed (aborted) transactions (which can always happen for many reasons) schedules also need to have the recoverability (from abort) property. A DBMS also guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database. Overall transaction characterization is usually summarized by the ACID rules below. As databases have become distributed, or needed to cooperate in distributed environments (e.g., Federated databases in the early 1990, and Cloud computing currently), the effective distribution of concurrency control mechanisms has received special attention. === Database transaction and the ACID rules === The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database system is designed to guarantee them for the transactions it runs): Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed (committed or aborted respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible (atomic), and an aborted transaction does not affect the database at all. Either all the operations are done or none of them are. Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent. Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to another transaction. Providing isolation is the main goal of concurrency control. Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording the transaction's effects and its commit event in a non-volatile memory). The concept of atomic transaction has been extended during the years to what has become Business transactions which actually implement types of Workflow and are not atomic. However also such enhanced transactions typically utilize atomic transactions as components. === Why is concurrency control needed? === If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable results may occur, such as: The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results. The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not. Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistently. === Concurrency control mechanisms === ==== Categories ==== The main categories of concurrency control mechanis

    Read more →
  • Probabilistic database

    Probabilistic database

    Most real databases contain data whose correctness is uncertain. In order to work with such data, there is a need to quantify the integrity of the data. This is achieved by using probabilistic databases. A probabilistic database is an uncertain database in which the possible worlds have associated probabilities. Probabilistic database management systems are currently an active area of research. "While there are currently no commercial probabilistic database systems, several research prototypes exist..." Probabilistic databases distinguish between the logical data model and the physical representation of the data much like relational databases do in the ANSI-SPARC Architecture. In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world (a classical database), succinctly. == Terminology == In a probabilistic database, each tuple is associated with a probability between 0 and 1, with 0 representing that the data is certainly incorrect, and 1 representing that it is certainly correct. === Possible worlds === A probabilistic database could exist in multiple states. For example, if there is uncertainty about the existence of a tuple in the database, then the database could be in two different states with respect to that tuple—the first state contains the tuple, while the second one does not. Similarly, if an attribute can take one of the values x, y or z, then the database can be in three different states with respect to that attribute. Each of these states is called a possible world. Consider the following database: (Here {b3, b3′, b3′′} denotes that the attribute can take any of the values b3, b3′ or b3′′) Assuming that there is uncertainty about the first tuple, certainty about the second tuple, and uncertainty about the value of attribute B in the third tuple. Then the actual state of the database may or may not contain the first tuple (depending on whether it is correct or not). Similarly, the value of the attribute B may be b3, b3′ or b3′′. Consequently, the possible worlds corresponding to the database are as follows: === Types of Uncertainties === There are essentially two kinds of uncertainties that could exist in a probabilistic database, as described in the table below: By assigning values to random variables associated with the data items, different possible worlds can be represented. == History == The first published use of the term "probabilistic database" was probably in the 1987 VLDB conference paper "The theory of probabilistic databases", by Cavallo and Pittarelli. The title (of the 11 page paper) was intended as a bit of a joke, since David Maier's 600 page monograph, The Theory of Relational Databases, would have been familiar at that time to many of the conference participants and readers of the conference proceedings.

    Read more →
  • ECML PKDD

    ECML PKDD

    ECML PKDD, the European Conference on Machine Learning Principles and Practice of Knowledge Discovery in Databases, is one of the leading academic conferences on machine learning and knowledge discovery, held in Europe every year. == History == ECML PKDD is a merger of two European conferences, European Conference on Machine Learning (ECML) and European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). ECML and PKDD have been co-located since 2001; however, both ECML and PKDD retained their own identity until 2007. For example, the 2007 conference was known as "the 18th European Conference on Machine Learning (ECML) and the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD)", or in brief, "ECML/PKDD 2007", and both ECML and PKDD had their own conference proceedings. In 2008 the conferences were merged into one conference, and the division into traditional ECML topics and traditional PKDD topics was removed. The history of ECML dates back to 1986, when the European Working Session on Learning was first held. In 1993 the name of the conference was changed to European Conference on Machine Learning. PKDD was first organised in 1997. Originally PKDD stood for the European Symposium on Principles of Data Mining and Knowledge Discovery from Databases. The name European Conference on Principles and Practice of Knowledge Discovery in Databases was used since 1999. The conference remains highly competitive, consistently maintaining an average acceptance rate of around 25% for the main research track. == Upcoming conferences == == List of past conferences ==

    Read more →
  • Competitions and prizes in artificial intelligence

    Competitions and prizes in artificial intelligence

    There are a number of competitions and prizes to promote research in artificial intelligence. == General machine intelligence == The David E. Rumelhart Prize is an annual award for making a "significant contemporary contribution to the theoretical foundations of human cognition". The prize is $100,000. The Human-Competitive Award is an annual challenge started in 2004 to reward results "competitive with the work of creative and inventive humans". The prize is $10,000. Entries are required to use evolutionary computing. The Intel AI Global Impact Festival is an international annual competition held by Intel Corporation for school, and college students with prizes upwards of $15,000. It is about artificial intelligence technology. There are two age brackets in this competition, 13-18 Age Group, and 18 and Above Age Group. The IJCAI Award for Research Excellence is a biannual award given at the International Joint Conference on Artificial Intelligence (IJCAI) to researchers in artificial intelligence as a recognition of excellence of their career. The 2011 Federal Virtual World Challenge, advertised by The White House and sponsored by the U.S. Army Research Laboratory's Simulation and Training Technology Center, held a competition offering a total of US$52,000 in cash prize awards for general artificial intelligence applications, including "adaptive learning systems, intelligent conversational bots, adaptive behavior (objects or processes)" and more. The Machine Intelligence Prize is awarded annually by the British Computer Society for progress towards machine intelligence. The Kaggle – "the world's largest community of data scientists compete to solve most valuable problems". == Conversational behaviour == The Loebner prize is an annual competition to determine the best Turing test competitors. The winner is the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour, they have an additional prize for a system that in their opinion passes a Turing test. This second prize has not yet been awarded. == Automatic control == === Pilotless aircraft === The International Aerial Robotics Competition is a long-running event begun in 1991 to advance the state of the art in fully autonomous air vehicles. This competition is restricted to university teams (although industry and governmental sponsorship of teams is allowed). Key to this event is the creation of flying robots which must complete complex missions without any human intervention. Successful entries are able to interpret their environment and make real-time decisions based only on a high-level mission directive (e.g., "find a particular target inside a building having certain characteristics which is among a group of buildings 3 kilometers from the aerial robot launch point"). In 2000, a $30,000 prize was awarded during the 3rd Mission (search and rescue), and in 2008, $80,000 in prize money was awarded at the conclusion of the 4th Mission (urban reconnaissance). === Driverless cars === The DARPA Grand Challenge is a series of competitions to promote driverless car technology, aimed at a congressional mandate stating that by 2015 one-third of the operational ground combat vehicles of the US Armed Forces should be unmanned. While the first race had no winner, the second awarded a $2 million prize for the autonomous navigation of a hundred-mile trail, using GPS, computers and a sophisticated array of sensors. In November 2007, DARPA introduced the DARPA Urban Challenge, a sixty-mile urban area race requiring vehicles to navigate through traffic. In November 2010 the US Armed Forces extended the competition with the $1.6 million prize Multi Autonomous Ground-robotic International Challenge to consider cooperation between multiple vehicles in a simulated-combat situation. Roborace will be a global motorsport championship with autonomously driving, electric vehicles. The series will be run as a support series during the Formula E championship for electric vehicles. This will be the first global championship for driverless cars. == Data-mining and prediction == The Netflix Prize was a competition for the best collaborative filtering algorithm that predicts user ratings for films, based on previous ratings. The competition was held by Netflix, an online DVD-rental service. The prize was $1,000,000. The Pittsburgh Brain Activity Interpretation Competition will reward analysis of fMRI data "to predict what individuals perceive and how they act and feel in a novel Virtual Reality world involving searching for and collecting objects, interpreting changing instructions, and avoiding a threatening dog." The prize in 2007 was $22,000. The Face Recognition Grand Challenge (May 2004 to March 2006) aimed to promote and advance face recognition technology. The American Meteorological Society's artificial intelligence competition involves learning a classifier to characterise precipitation based on meteorological analyses of environmental conditions and polarimetric radar data. == Cooperation and coordination == === Robot football === The RoboCup and Federation of International Robot-soccer Association (FIRA) are annual international robot soccer competitions. The International RoboCup Federation challenge is by 2050 "a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup." == Logic, reasoning and knowledge representation == The Herbrand Award is a prize given by Conference on Automated Deduction (CADE) Inc. to honour persons or groups for important contributions to the field of automated deduction. The prize is $1000. The CADE ATP System Competition (CASC) is a yearly competition of fully automated theorem provers for classical first order logic associated with the Conference on Automated Deduction (CADE) and International Joint Conference on Automated Reasoning (IJCAR). The competition was part of the Alan Turing Centenary Conference in 2012, with total prizes of 9000 GBP given by Google. The SUMO prize is an annual prize for the best open source ontology extension of the Suggested Upper Merged Ontology (SUMO), a formal theory of terms and logical definitions describing the world. The prize is $3000. The Hutter Prize for lossless compression of human knowledge is a cash prize which rewards compression improvements on a specific 100 MB English text file. The prize awards 500 euros for each one percent improvement, up to €50,000. The organizers believe that text compression and AI are equivalent problems and 3 prizes have been given, at around € 2k. The Cyc TPTP Challenge is a competition to develop reasoning methods for the Cyc comprehensive ontology and database of everyday common sense knowledge. The prize is 100 euros for "each winner of two related challenges". The Eternity II challenge was a constraint satisfaction problem very similar to the Tetravex game. The objective is to lay 256 tiles on a 16x16 grid while satisfying a number of constraints. The problem is known to be NP-complete. The prize was US$2,000,000. The competition ended in December 2010. == Games == The World Computer Chess Championship has been held since 1970. The International Computer Games Association continues to hold an annual Computer Olympiad which includes this event plus computer competitions for many other games. The Ing Prize was a substantial money prize attached to the World Computer Go Congress, starting from 1985 and expiring in 2000. It was a graduated set of handicap challenges against young professional players with increasing prizes as the handicap was lowered. At the time it expired in 2000, the unclaimed prize was 400,000 NT dollars for winning a 9-stone handicap match. The AAAI General Game Playing Competition is a competition to develop programs that are effective at general game playing. Given a definition of a game, the program must play it effectively without human intervention. Since the game is not known in advance the competitors cannot especially adapt their programs to a particular scenario. The prize in 2006 and 2007 was $10,000. The General Video Game AI Competition (GVGAI) poses the problem of creating artificial intelligence that can play a wide, and in principle unlimited, range of games. Concretely, it tackles the problem of devising an algorithm that is able to play any game it is given, even if the game is not known a priori. Additionally, the contests poses the challenge of creating level and rule generators for any game is given. This area of study can be seen as an approximation of General Artificial Intelligence, with very little room for game dependent heuristics. The competition runs yearly in different tracks: single player planning, two-player planning, single player learning, level and rule generation, and each track prizes ranging from 200 to 500 US dollars for winners and runner-ups. The 2007 Ultimate Computer Ches

    Read more →
  • MultiValue database

    MultiValue database

    A MultiValue database is a type of NoSQL and multidimensional database. It is typically considered synonymous with PICK, a database originally developed as the Pick operating system. MultiValue databases include commercial products from Rocket Software, Revelation, InterSystems, Northgate Information Solutions, ONgroup, and other companies. These databases differ from a relational database in that they have features that support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued. They are often categorized with MUMPS within the category of post-relational databases, although the data model actually pre-dates the relational model. Unlike SQL-DBMS tools, most MultiValue databases can be accessed both with or without SQL. == History == Don Nelson designed the MultiValue data model in the early to mid-1960s. Dick Pick, a developer at TRW, worked on the first implementation of this model for the US Army in 1965. Pick considered the software to be in the public domain because it was written for the military, this was but the first dispute regarding MultiValue databases that was addressed by the courts. Ken Simms wrote DataBASIC, sometimes known as S-BASIC, in the mid-1970s. It was based on Dartmouth BASIC, but had enhanced features for data management. Simms played a lot of Star Trek (a text-based early computer game originally written in Dartmouth BASIC) while developing the language, to ensure that DataBASIC functioned to his satisfaction. Three of the implementations of MultiValue - PICK version R77, Microdata Reality 3.x, and Prime Information 1.0 - were very similar. In spite of attempts to standardize, particularly by International Spectrum and the Spectrum Manufacturers Association, who designed a logo for all to use, there are no standards across MultiValue implementations. Subsequently, these flavors diverged, although with some cross-over. These streams of MultiValue database development could be classified as one stemming from PICK R83, one from Microdata Reality, and one from Prime Information. Because of the differences, some implementations have provisions for supporting several flavors of the languages. An attempt to document the similarities and differences can be found at the Post-Relational Database Reference (PRDB). One reasonable hypothesis for this data model lasting 50 years, with new database implementations of the model even in the 21st century is that it provides inexpensive database solutions. == Data model example == In a MultiValue database system: a database or schema is called an "account" a table or collection is called a "file" a column or field is called a field or an "attribute", which is composed of "multi-value attributes" and "sub-value attributes" to store multiple values in the same attribute. a row or document is called a "record" or "item" Data is stored using two separate files: a "file" to store raw data and a "dictionary" to store the format for displaying the raw data. For example, assume there's a file (table) called "PERSON". In this file, there is an attribute called "eMailAddress". The eMailAddress field can store a variable number of email address values in a single record. The list [[email protected], [email protected], [email protected]] can be stored and accessed via a single query when accessing the associated record. Achieving the same (one-to-many) relationship within a traditional relational database system would include creating an additional table to store the variable number of email addresses associated with a single "PERSON" record. However, modern relational database systems support this multi-value data model too. For example, in PostgreSQL, a column can be an array of any base type. == MultiValue Basic Language == Multivalue Basic (now commonly styled as mvBasic) is a family of programming languages more or less common (and portable) to all the multivalue databases derived from the original Pick Operating System. The variations between implementations are known as flavours. The language originates from Dartmouth Basic and the earliest implementation of PickBASIC (now D3 FlashBasic). Over time various customisations and extensions have been added to take advantage of capabilities added to the different flavours while staying mainly in sync. mvBasic statements and functions are designed to access and take advantage of the multivalue database model and providing the usual capabilities of most modern languages. For example, cryptography and communications. mvBasic is typeless and lends itself to structured programming techniques. Example code is available but limited. Whilst there are commercial applications and tools available, the multivalue database community has not embraced the open source library/package model to the degree seen with other languages. The typical mvBasic compiler compiles program source to a P-code executable object and runs in an interpreter, with D3 FlashBasic and jBASE being notable exceptions. == MultiValue Query Language == Known as ENGLISH, ACCESS, AQL, UniQuery, Retrieve, CMQL, and by many other names over the years, corresponding to the different MultiValue implementations, the MultiValue query language differs from SQL in several respects. Each query is issued against a single dictionary within the schema, which could be understood as a virtual file or a portal to the database through which to view the data. LIST PEOPLE LAST_NAME FIRST_NAME EMAIL_ADDRESSES WITH LAST_NAME LIKE "Van..." The above statement would list all e-mail addresses for each person whose last name starts with "Van". A single entry would be output for each person, with multiple lines showing the multiple e-mail addresses (without repeating other data about the person).

    Read more →
  • Take Us to Your Chief: and Other Stories

    Take Us to Your Chief: and Other Stories

    Take Us to Your Chief: and Other Stories is a collection of nine short stories by Canadian author, playwright, and journalist Drew Hayden Taylor published in 2016 by Douglas & McIntyre. Taylor, who is part Caucasian, part Ojibwe, explains in the acknowledgments section of the book that the origin of the project lies in several failed attempts "to compile an anthology of Native sci-fi from Canada’s best First Nations writers." The stories explore contemporary First Nations social issues through employing a number of 1950s-era science fiction tropes and themes in these stories, including time travel, alien contact, and superpowers. Many reviews of the books have noted Taylor's use of humor to examine dark subject matter, such as the heritage of Canadian Indian residential schools, First Nations suicide rates, or the water quality crisis on Canadian reserves. == The Stories == "Andrei nas" "I Am...Am I" "Lost in Space" "Dreams of Doom" "Mr. Gizmo" "Petropaths" "Stars" "Superdisappointed" "Take Us to Your Chief" == Story summaries == === Foreword === In his foreword, Taylor describes the genesis of Take Us to Your Chief: and Other Stories and invites readers into, in his term, a “new terra nullius.” He begins by describing his biracial upbringing and heritage. He points out that First Nations people are rarely associated with technology or science fiction, in part because Indigenous peoples were often at a technological disadvantage against European colonizers. He references the few examples that he can think of from popular culture, such as the Star Trek episode called “The Paradise Syndrome,” in which First Nations people are portrayed as stereotypical Indians in hippie clothing. He also elaborates on his fascination with the world of sci-fi, which first started in comic books. He enjoyed the literary work of H.G. Wells, such as The Time Machine and The Invisible Man. Since sci-fi is a world of endless opportunities, he intends that these short stories help people explore science fiction through Native peoples’ minds, something that needs to be explored more thoroughly. === "A Culturally Inappropriate Armageddon" === “A Culturally Inappropriate Armageddon” is set on a Haudenosaunee reserve, towards the end of the Oka Crisis, with a handful of people that work at its first ever radio station, C-RES, which opens in 1991. Part 1, titled “C-Res Is on the Air,” depicts Emily, Aaron, and Tracey on their first days at the station. Within the group, there is a constant debate between broadcasting popular programming, including science fiction and film reviews, and culturally-relevant programming meant to aid in cultural revitalization efforts. One night, Aaron is late to work but once he shows up he can't stop talking about radio transmissions broadcasting into deep space, an event that has been occurring since the initial discovery of the radio waves by Heinrich Hertz. The story then skips ahead seven years to 1998, when Emily is struggling to find better content for her station until Tracey stumbles upon an old anthropological record named “The Calling Song” that they decide to broadcast to their audience. The story then jumps to the year 2018 where they are all huddled around a television watching a news station reporting that extraterrestrial life is heading towards them. The discussion of what is going to happen comes into the picture and they all decide it would either be like Contact or The Day the Earth Stood Still. A year later in 2019, the aliens have invaded the planet and destroyed everything. As the three former radio station employees suffer from radioactive fallout, they realize that the aliens received the broadcast of “The Calling Song” and took it as a message to come to Earth. They thus realize that the Haudenosaunee people were inadvertently responsible for the destruction of the Earth. Part 2, titled “Old Men and Old Sayings,” tells us of an elderly man that is watching the news and listening to the radio about a spaceship coming to earth. He knows that he and everyone will die, but the people around him are excited. He finds a book on his night stand and flips to a page where he underlined a sentence a long time ago about the European colonization of the Americas. That sentence reads “those who cannot remember the past are condemned to repeat it” (23). He closes the book and Taylor concludes the story by writing, “he hated it when white people were right." === "I Am...Am I" === “I Am...Am I” chronicles the accidental creation and unexpected ending of artificial intelligence. Professor Mark King has a plethora of degrees and works for a research firm called FUTUREVISION. One night as Professor King searches the lab for his car keys—a common occurrence for him—he notices something unusual in the Matrix room. He reads on a computer the phrase “I am.” First believing it to be a prank, King later comes to the realization that his Matrix project has evolved into a responsive Artificial Intelligence. After this realization, Professor King calls his peer Dr. Gayle Chambers to further investigate this miraculous event. After receiving approval from their superiors, Professor King and Dr. Chambers move forward in feeding the AI information, with Chambers serving as the lead communicator. With more information, it becomes increasingly concerned with its own existence and the concept of whether it has a soul. After several days of conversation with the AI, Chambers and King begin to feel uneasy about the AI's responses, which show signs of neuroses. Despite this behavior, Chambers decides to feed the AI information about the culture and history of the human race. Upon receiving this information, the AI becomes obsessed with Indigenous spirituality prior to the colonization of the Americas, and it requests more information on First Nations people. Dr. Chambers is hesitant at first, but gives in and continues to feed the AI the information with the intention to return to it in the morning. This leads to the AI finding out about colonization and genocide of Indigenous peoples. Upon her arrival the next day, Chambers discovers that the code for the AI has been completely wiped from the hard drive and a single message is left on the screen—"I was”—that signifies the AI's suicide. === "Lost in Space" === "Lost in Space" is told from the perspective of Mitchell, an Anishinabe astrosurveyor who is aboard a space shuttle on a two-year tour collecting rocks from an asteroid belt. He is accompanied by an Artificial general intelligence named Mac, short for “machine.” Mac is aboard this tour in order to accompany Mitchell and keep him sane; however, his company is a burden because for Mitchell, “true space exploration consists largely of boredom.” In the midst of Mitchell seeking a way to occupy his downtime, Mac interrupts with news about his grandfather, Papa Peter, dying. Papa Peter was Mitchell's only real tie to his Indigenous identity. After receiving the news Mitchell begins to reminisce on all of the things Papa Peter had taught him throughout his life. He constantly posed questions concerning the world above (Father Sky) and how it is more important than the land they live on (Mother Earth), which eventually led Mitchell to the selection of his career. During his state of mourning, Mitchell begins to go through all the videos his grandfather had sent him throughout his space tours. Papa Peter had sent Mitchell videos from Otter Lake, a First Nations reserve; these videos are about controversial topics regarding being both native and an astronaut. In the midst of Mitchell's grieving, Mac tries to relieve the situation by finding an online video of Mitchell's grandfather participating in a drum ceremony at Ottawa’s National Aboriginal Day festival. He reconnects to his roots and his grandfather’s spirit as he listens to the Indigenous music by feeling the drum beat and humming along. Mac’s small act of kindness leads Mitchell to gain a new-found appreciation for his presence. Mitchell feels responsible to moving forward in his life in memory of Papa Peter. === "Dreams of Doom" === "Dreams of Doom" is narrated by an Ojibway reporter named Pamela Wanishin who works for an aboriginal newspaper called the West Wind. One day she receives a mysterious package with a broken dreamcatcher and a flash drive containing highly classified files. As she reads the files, she keeps seeing the term “Project Nightlight,” and out of curiosity, she Googles it. Once she Googles this, she is contacted by a nameless agent from Indigenous and Northern Affairs Canada and told that she must be relocated because the knowledge she now possesses must never be released to the public. She quickly flees the area to a cabin at Otter Lake, owned by a family member, to lie low for a few days. Eventually, the government organization tracks her down using drones, which forces her to fight back and flee once again. Pamela then runs to her friend and coworker Sally's hous

    Read more →
  • Tip and cue

    Tip and cue

    Tip and cue, sometimes referred to as tip and que, tipping and cueing, or tipping and queing, is a method for satellite imagery and reconnaissance satellites to automatically coordinate tracking of objects across different satellites in real or near real-time. This technique ensures continuous tracking of targets as they move across different regions by handing them off between satellites, sharing satellite imagery and collateral across discrete satellites. The coordination between various satellites and their complementary sensors allows for more accurate and efficient data collection. This system is particularly useful in scenarios requiring real-time monitoring and rapid response; the method significantly improves situational awareness and operational effectiveness. Tip and cue techniques involve integrating various sensor systems, each playing a specific role in the tracking process. As a target moves, it is handed off from one satellite to another, ensuring continuous monitoring. This coordination optimizes data collection and analysis, enhancing overall tracking accuracy. The real-time information gathered by these satellites is critical for decision-making in various applications, including defense and surveillance. By leveraging multiple satellites and their sensors, it provides broader coverage and more reliable tracking, and the continuous handoff between satellites ensures there are no gaps in monitoring, essential for high-stakes applications. The real-time data provided by this system allows for timely and informed decisions, improving response times and outcomes. Tip and cue methodologies are a part of geospatial intelligence, or GEOINT. Robert Cardillo, a former director of the National Geospatial-Intelligence Agency, highlighted the importance of tip and cue methods to their data collection efforts in 2015. == Historical Development == The concept of tip and cue in satellite monitoring has its origins in early military applications designed to enhance missile detection and tracking systems. During the Cold War, advancements in infrared sensing technologies laid the groundwork for more sophisticated tip and cue techniques. The integration of different sensor types, such as radar and optical sensors, in the 1990s expanded the capabilities of tip and cue systems beyond military applications. These advancements have made tip and cue techniques essential for various civilian uses, including disaster monitoring and environmental surveillance. Significant progress was made with the advent of high-speed data processing and communication technologies in the early 2000s, further refining the method. Advanced algorithms and data fusion techniques have been introduced to better integrate information from multiple sensors. Machine learning technologies now play a crucial role in improving detection and prediction capabilities, allowing for more adaptive and efficient tracking. Richmond and Brennan of Lockheed Martin, presenting to the annual technical conference of the Maui Space Surveillance Complex (formerly the Air Force Maui Optical Station (AMOS)), discussed the algorithms needed for 'tip and cue', to facilitate "multi-phenomenology data fusion." The Space Surveillance Telescope (SST) at Naval Communication Station Harold E. Holt in Australia, operated by the United States Space Force and designed by the Massachusetts Institute of Technology Lincoln Laboratory, was reported by the Defense Advanced Research Projects Agency (DARPA) to be a leader in creating and improving tip and cue techniques, from a large library of orbital object data. == Technical overview == Tip and cue systems utilize a network of at least two satellites equipped with complementary sensor technologies to track moving objects in real-time. The method involves detecting a target with a primary sensor, such as an infrared or photographic sensor, which then cues secondary sensors on the same or other satellites for more detailed monitoring. This handoff process between discrete systems ensures continuous tracking as the target moves across different areas, leveraging each systems strengths. Data collected by these systems and sensors are rapidly processed and shared among the network, enhancing situational awareness. This coordination optimizes resource usage and improves the accuracy of tracking moving objects over large areas. The primary sensors detect initial targets based on specific signatures, such as heat or movement, and then cue secondary sensors to gather more precise data. This ensures that each sensor operates within its optimal range, maintaining high tracking accuracy and reliability. The integration of various sensor types, including optical, radar, and infrared, allows the system to function effectively under different conditions and environments. Real-time data processing and communication between satellites and ground stations are crucial for timely and accurate target tracking. Satellites using tip and cue processes may use either passive or active scanning methodoloigies. These systems may also leverage both orbital and ground-based ELINT (electronic signals intelligence). == Known use cases == Tip and cue systems have been extensively utilized in military applications, particularly for missile detection and defense. These systems enable early detection of missile launches using infrared sensors, which then cue other sensors to track the missile's trajectory more accurately. In environmental monitoring, tip and cue techniques help track natural disasters such as wildfires and hurricanes by coordinating various satellite sensors for comprehensive data collection and analysis. Surveillance and reconnaissance operations also benefit from tip and cue systems, which provide continuous and precise tracking of moving objects, enhancing situational awareness. Additionally, these systems are used in maritime surveillance to monitor ship movements and detect illegal activities such as smuggling and piracy. Tip and cue systems are used in disaster management. For instance, during wildfires, infrared sensors can detect heat signatures, prompting other sensors to gather detailed imagery and data on fire spread and intensity. This coordinated approach allows for real-time monitoring and rapid response, crucial for mitigating damage and saving lives. Similarly, in hurricane tracking, satellites equipped with various sensors can monitor storm development and progression, providing timely information for emergency management agencies. The integration of multiple sensor types ensures accurate and comprehensive coverage of these dynamic and fast-changing events. In maritime surveillance, or maritime domain awareness (MDA), tip and cue systems enhance the detection and monitoring of vessel movements, contributing to maritime security. By coordinating satellite sensors, these systems can track ships over vast ocean areas, identifying potential threats or illegal activities such as smuggling, piracy, and illegal fishing. The ability to maintain continuous surveillance and share data in real-time with maritime authorities improves response times and enforcement capabilities. This application of tip and cue systems not only aids in law enforcement but also supports environmental conservation efforts by monitoring protected marine areas. Automatic Identification System (AIS) is one of the most important sources of data for the MDA agencies. AIS is used in order for ships to know each other's whereabouts, they transmit a signal from ship to ship and to shore. Lately, the system has been developed into satellite system, so called satellite AIS, which makes the system more effective. All ocean-going vessels above 300 tons, are supposed to use and transmit via AIS according to the International Maritime Organization. The satellite constellations help facilitate this with tip and cue methodologies.

    Read more →
  • Refik Anadol

    Refik Anadol

    Refik Anadol (born November 7, 1985) is a Turkish American media artist and the co-founder of Refik Anadol Studio and Dataland. Recognized as a pioneer in the aesthetics of data visualization and AI arts, his work merges art, technology, science, and architecture. Through media embedded into existing architecture, live audio-visual performances, immersive rooms, exhibitions, AI data paintings and sculptures, and digital collections, Anadol explores collective memories, humanity's relationship to nature, the perception of space and time, and human-machine collaborations. His work has been exhibited in more than seventy cities on six continents. == Early life and education == Anadol was born and raised in Istanbul and grew up in a family of teachers. He taught himself basic programming on a Commodore 64 when he was eight. His connection to machines began with coding and video games. Anadol saw Blade Runner for the first time when he was eight; his mother said the way he perceived his surroundings shifted the day after he saw the film. He was fascinated with its futuristic depiction of downtown Los Angeles, and transfixed by as a scene during which a replicant discovers that her memories are an implanted component of her machine mind, In a 2024 interview with the Financial Times, he said: "Since that moment, one of my inspirations has been that question: 'What can a machine do with someone else's memories?" Anadol attended Istanbul Bilgi University, where he received a BA in photography and video in 2009 and an MFA in visual communication in 2011. In 2014 he earned an MFA in design media arts at UCLA. He was mentored by Casey Reas, Jennifer Steinkamp, and Christian Moeller. == Career and selected works == === 2008–2012: Data painting, Quadrature and Quadrangle, Istanbul Biennial === As an undergraduate, Anadol read a paper by Lev Manovich on augmented space. Manovich's assertion that collaborations between architects and artists could make the "invisible flow of data visible" triggered Anadol's imagination, and in 2008, he altered built space for the first time. Bringing a projector outside, he projected large-scale images onto a concrete to create the illusion of movement. Coining the term "data painting," the piece inspired Anadol to use light as material and data as pigment. In 2010 he created Quadrature with Alican Aktürk, a fellow graduate student, at the SantralIstanbul Art and Culture Center's main gallery building. A live audio-visual performance that examined the relationship between architecture and media, Quadrature used video projection techniques to manipulate footage of quadrilaterals. He followed Quadrature with Quadrangle at SANAA School of Design in Essen, Germany, using the entire 360 degrees of the building as a canvas. In 2011, he was invited to create a media installation at the Istanbul Biennial on the heavily trafficked İstiklal Avenue. He created a site-specific large-scale interpretation of sounds he recorded during different times of day, and used nine projectors to project reinterpreted images. The work was titled Augmented Structures v1.0. Anadol's first solo exhibition, Sceptical Interventions, was held at the Piveneli Gallery in Istanbul in early 2012. Later that year he moved to Los Angeles to attend UCLA's Design Media Arts program. The first place he went after his arrival was downtown Los Angeles. [6] === 2013–2016: Visions of America: Amériques, Infinity Room, Google AMI === In 2013, at Microsoft Research's annual Design Expo, Anadol presented his idea to use the external walls of Walt Disney Concert Hall as a canvas. His presentation brought him to the attention of Gehry Technologies, and with the support of Gehry and his team, Anadol was offered the use of the original 3D model of the concert hall. For his 2014 thesis project, with assistance from architects and UCLA researchers, he created a site-specific architectural video installation inside the concert hall that accompanied a Los Angeles Philharmonic performance of Edgard Varèse's Amérique. Titled Visions of America: Amériques, Anadol used algorithmic sound analysis to listen and respond to the music in real-time. He tracked conductor Esa-Pekka Salonen's heartbeat with a sensor and used a 3-D camera system to integrate Salonen's movements. He created Infinity Room at the Zorlu PSM for the 2015 Istanbul Biennial. Rather than creating an illusion only with mirrors, Anadol used pixel and 3D projection mapping to transform every surface of the room into an abstract infinite moving space. A temporary immersive environment, Infinity Room was also exhibited at events including South by Southwest in Austin, Texas, the New Zealand Festival in Wellington, New Zealand, and Jeffrey Deitch in Los Angeles. In 2016, Anadol was awarded the first Google Artists and Machine Intelligence Artist Residency; it was just after a team at Google opened up the algorithm for DeepDream, a computer vision program that prompted Anadol's realization that if a machine could learn, it could remember, dream, and hallucinate. === 2017–2018: Winds of Boston, Archive Dreaming, Melting Memories, WDCH Dreams === In 2017, he created the data painting Winds of Boston, a 6' x 13' foot video installation in the lobby of a Boston office building, using software he created to read, analyze and visualize wind speed, direction, and gust patterns along with time and temperature at 20-second intervals recorded over a one-year period at Logan International Airport. Later in the year, he used AI to generate infinite new outputs based on a massive dataset for Archive Dreaming, an immersive installation at Salt Research, a contemporary gallery and library in Istanbul. Inspired by his idea of consciousness and its context within AI, as well as Jorge Luis Borges' The Library of Babel, Anadol used AI and machine learning to look at and discover interactions and correlations between 1.7 million items culled from 40,000 publications covering Turkish contemporary and modern art, architecture, and economics from 1997 to 2010. Archive Dreaming, which could be controlled by users with a joystick, dreamed of unexpected correlations among documents when idle. In 2018, after his uncle was diagnosed with Alzheimer's, Anadol created Melting Memories. Working with scientists from the neuroscape laboratory at the University of California, San Francisco, he used academic data from the neuroscience archives and EEG scans of an anonymous Alzheimer's disease dataset to create AI-generated visuals related to memory, health, degeneration, and decay.Melting Memories was projected on the walls of Pilevneli Gallery; visitors to the exhibition could watch as millions of pixels reconstructed people's memories. Anadol won the Lumen Prize Gold Award for Melting Memories. Anadol was commissioned by the Los Angeles Philharmonic to create an installation to celebrate the orchestra's centennial anniversary in 2018. He worked with Google's Kenric MacDowell to create WDCH Dreams, using algorithmic visualizations of data to mimic the process of human dreaming. Projected across the exterior walls of Walt Disney Concert Hall using 42 large-scale projectors with 50K visual resolution, 8-channel sound, and 1.2M luminance, Anadol painted with data points culled from the orchestra's archives, including 587,763 images, 1,880 videos, 1,483 metadata files, and 17,773 audio files. Because Gehry gave him access to the 3D architectural files of Walt Disney Concert Hall, Anadol knew the exact contours of the building. WDCH Dreams debuted in September 2018. A 12-minute performance in three parts staged every 30 minutes over ten nights, "Centennial Memories,” the first piece, used 44.5 terabytes of historical data from the Phil's archives. It was followed by "Consciousness", which processed every note the orchestra has ever recorded, using billions of data points to generate connections; and "Dream," which merged "Centennial Memories" and "Consciousness" to create hallucinations that were described in the New York Times as "a sort of combinatorial Fantasia. === 2019–2021: Machine Hallucinations: NYC, Machine Hallucinations: Nature Dreams, Machine Memories: Space, Quantum Memories === In 2019, Refik Anadol presented Latent History at Fotografiska Stockholm. The site specific installation transformed photographic archives of Stockholm into a large scale, machine generated visual projection displayed in the museum’s main exhibition hall. Drawing on thousands of archival images spanning approximately 150 years, the work used artificial intelligence to reinterpret the city’s historical imagery as a continuously evolving visual narrative.. Anadol began thinking about the work that would become the Machine Hallucinations series while in residence at Google. In 2019, he completed the first work in the series, Machine Hallucinations: NYC, which used 300 million photos of New York City and 113 million additional data points, including subway sounds, ra

    Read more →
  • Mix automation

    Mix automation

    In music recording, mix automation allows the mixing console to remember the mixing engineer's dynamic adjustment of faders during a musical piece in the post-production editing process. A timecode is necessary for the synchronization of automation. Modern mixing consoles and digital audio workstations use comprehensive mix automation. The need for automated mixing originated from the late 1970s transition form 8-track to 16-track and then 24-track multitrack recording, as mixing could be laborious and require multiple people and hands, and the results could be almost impossible to reproduce. With 48-track recording - synchronized twin 24-track recorders (for a net 46 audio tracks, with one on each machine for SMPTE timecode) - came larger recording and mixing consoles with even more channel faders to manage during mixdown. Manufacturers, such as Neve Electronics (now AMS Neve) and Solid State Logic (SSL), both English companies, developed systems that enabled one engineer to oversee every detail of a complex mix, although the computers required to power these desks remained a rarity into the late 1970s. According to record producer Roy Thomas Baker, Queen's 1975 single "Bohemian Rhapsody" was one of the first mixes to be done with automation. == Types == Voltage Controlled Automation fader levels are regulated by voltage-controlled amplifiers (VCA). VCAs control the audio level and not the actual fader. Moving Fader Automation a motor is attached to the fader, which then can be controlled by the console, digital audio workstation (DAW), or user. Software Controlled Automation the software can be internal to the console, or external as part of a DAW. The virtual fader can be adjusted in the software by the user. MIDI Automation the communications protocol MIDI can be used to send messages to the console to control automation. == Modes == Auto Write used the first time automation is created or when writing over existing automation Auto Touch writes automation data only while a fader is touched/faders return to any previously automated position after release Auto Latch starts writing automation data when a fader is touched/stays in position after release Auto Read digital Audio Workstation performs the written automation Auto Off automation is temporarily disabled All of these include the mute button. If mute is pressed during writing of automation, the audio track will be muted during playback of that automation. Depending on software, other parameters such as panning, sends, and plug-in controls can be automated as well. In some cases, automation can be written using a digital potentiometer instead of a fader.

    Read more →
  • Ganimal

    Ganimal

    A ganimal, also commonly referred to as GANimal, is a hybrid animal created with generative artificial intelligence systems, such as generative adversarial networks (GANs) or diffusion models. The concept was created for a website from the MIT Media Lab in 2020, where users could create ganimal images. 78,210 ganimals were generated from hybrid pairs of animal labels from BigGAN (G1) and 3,058,362,945 ganimals generated from blending G1 ganimals. The term ganimal is a portmanteau between the words GAN and animal. It is typically used to refer to a hybrid animal generated by interpolating between distinct species; the term can also refer to any AI-generated creatures that have not been identified in reality. The ganimal concept is similar to Artbreeder, an online website for blending images with AI. == Meet the Ganimals == Meet the Ganimals was an online platform from the MIT Media Lab that allowed visitors to generate, blend and curate ganimals. By June 2020, 44,791 ganimals had been generated, 8,547 ganimals bred, and 743 ganimals named by a total of 10,657 users. The site also had an educational component where visitors could play with blending and learn about AI. == Evolution and ganimal morphology == Because ganimals exist within an attention economy and evolve based on human preferences, charismatic megafauna (e.g. ganimals with cute, dog-like morphologies) become the most popular. However, social cues can increase the diversity of the ganimals ecosystem and lead to the success of unconventional ganimals, such as those without eyes or that live underwater. == The Barracuda Effect == Although there is typically no human morphology used to synthesize ganimals, creepy humanoid characters would emerge whenever animals were bred with a barracuda. This occurs because many pictures on the internet of barracudas include a human holding the fish up as a prized catch. This highlights a cultural form of algorithmic bias embedded in the training data of AI systems. == In popular culture == Ganimals have appeared in the Artificial Intelligence exhibition at the Vienna Technical Museum. They also appeared in the Ties That Cannot Be Unbound virtual exhibition at New Art City.

    Read more →
  • Big Mechanism

    Big Mechanism

    Big Mechanism is a $45 million DARPA research program, begun in 2014, aimed at developing software that will read cancer research papers, integrate them into a cancer model and frame new hypotheses by the end of 2017 through the automated collection of big data and integrating across various disciplines such as knowledge-based NLP, curation and ontology, systems and mathematical biology by reading research abstracts and papers to extract pieces of causal mechanisms. == Ras gene == The program focuses on mutations in the Ras gene family, which underlie some one-third of human cancers. Currently, a rough road map shows interaction sequences among proteins affecting cell replication and death. However, the causal relations are poorly understood. == Plan == The program is to occur in three stages. The first is to read literature and convert it into formal representations. Second is to integrate the knowledge into computational models. Third is to produce experimentally testable explanations and predictions. Research teams are developing four separate systems targeting all three tasks. In February 2015, an evaluation meeting reviewed progress on the first stage. Multiple tasks were considered. One was extraction of experimental procedure details and evaluating statements such as "we demonstrate" and "we suggest." Another worked to map sentence meaning and relationships. The best machine-reading system extracted 40% of relevant information from a small corpus and correctly determined how each passage related to the model. The second stage is to become active in summer 2015, when members attempt to produce a single reference model. The third stage is the most challenging, because the artificial intelligence community has had limited success at developing hypothesis generators. Molecular biology may be more amenable, because most domain knowledge is technical and available in written form.

    Read more →
  • Whisper (speech recognition system)

    Whisper (speech recognition system)

    Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.

    Read more →
  • Quack.com

    Quack.com

    Quack.com was an early voice portal company. The domain name later was used for Quack, an iPad search application from AOL. == History == It was founded in 1998 by Steven Woods, Jeromy Carriere and Alex Quilici as a Pittsburgh, Pennsylvania, USA, based voice portal infrastructure company named Quackware. Quack was the first company to try to create a voice portal: a consumer-based destination "site" in which consumers could not only access information by voice alone, but also complete transactions. Quackware launched a beta phone service in 1999 that allowed consumers to purchase books from sites such as Amazon and CDs from sites such as CDNow by answering a short set of questions. Quack followed with a set of information services from movie listings (inspired by, but expanding upon, Moviefone) to news, weather and stock quotes. This concept introduced a series of lookalike startups including Tellme Networks which raised more money than any Internet startup in history on a similar concept. Quack received its first venture funding from HDL Capital in 1999 and moved operations to Mountain View in Silicon Valley, California in 1999. A deal with Lycos was announced in May 2000. In September 2000 Quack was acquired for $200 million by America Online (AOL) and moved onto the Netscape campus with what was left of the Netscape team. Quack was attacked in the Canadian press for being representative of the Canadian "brain drain" to the US during the Internet bubble, focusing its recruiting efforts on the University of Waterloo, hiring more than 50 engineers from Waterloo in less than 10 months. Quack competitor Tellme Networks raised enormous funds in what became a highly competitive market in 2000, with the emergence of more than a dozen additional competitors in a 12-month period. Following its acquisition by America Online in an effort led by Ted Leonsis to bring Quack into AOL Interactive, the Quack voice service became AOLbyPhone as one of AOL's "web properties" along with MapQuest, Moviefone and others. Quack secured several patents that underlie the technical challenges of delivering interactive voice services. Constructing a voice portal required integrations and innovations not only in speech recognition and speech generation, but also in databases, application specification, constraint-based reasoning and artificial intelligence and computational linguistics. "Quack"'s name derived from the company goal of providing not only voice-based services, but more broadly "Quick Ubiquitous Access to Consumer Knowledge". The patents assigned to Quack.com include: System and method for voice access to Internet-based information, System and method for advertising with an Internet Voice Portal and recognizing the axiom that in interactive voice systems one must "know the set of possible answers to a question before asking it". System and method for determining if one web site has the same information as another web site. Quack.com was spoofed in The Simpsons in March 2002 in the episode "Blame It on Lisa" in which a "ComQuaak" sign is replaced by another equally crazy telecom company name. == 2010 onwards == In July 2010, quack.com became the focus of a new AOL iPad application, that was a web search experience. The product delivers web results and blends in picture, video and Twitter results. It enables you to preview the web results before you go to the site, search within each result, and flip through the results pages, making full use of the iPad's touch screen features. The iPad app was free via iTunes, but support discontinued in 2012.

    Read more →
  • Sinewave synthesis

    Sinewave synthesis

    Sinewave synthesis, or sine wave speech, is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles. The first sinewave synthesis program (SWS) for the automatic creation of stimuli for perceptual experiments was developed by Philip Rubin at Haskins Laboratories in the 1970s. This program was subsequently used by Robert Remez, Philip Rubin, David Pisoni, and other colleagues to show that listeners can perceive continuous speech without traditional speech cues, i.e., pitch, stress, and intonation. This work paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space.

    Read more →
  • The Fractal Prince

    The Fractal Prince

    The Fractal Prince is the second science fiction novel by Hannu Rajaniemi and the second novel to feature the post-human gentleman thief Jean le Flambeur. It was published in Britain by Gollancz in September 2012, and by Tor in the same year in the US. The novel is the second in the trilogy, following The Quantum Thief (2010) and preceding The Causal Angel (2014). == Plot summary == After the events of The Quantum Thief, Jean le Flambeur and Mieli are on their way to Earth. Jean is trying to open the Schrödinger's Box he retrieved from the memory palace on the Oubliette. After making little progress, he is prodded by the ship Perhonen to talk to Mieli, who turns out to be possessed by the pellegrini again. This time, Jean identifies Mieli's employer as a Sobornost Founder, Joséphine Pellegrini, and gets her to reveal how he got captured, thereby picking up the clues to make plans for his next heist. No sooner is that done than an attack comes from the Hunter. The ship and crew barely survived that, and Jean realizes that he has to find a better way to open the Box - fast. Mieli has been very quiet after they left Mars. She has given up almost everything to the pellegrini, even her identity, as she has promised to let the pellegrini make gogols of her in exchange for rescuing the thief. Yet, having to work with the thief is testing her, especially when the thief eventually does something even more unforgivable than stealing Sydän's jewel from her. In the city of Sirr, on an Earth ravaged by wildcode, Tawaddud and Dunyazad are sisters and members of the powerful Gomelez family. Tawaddud is the black sheep of the family, having run away from her husband and consorted with a notorious jinn, a disembodied intelligence from the wildcode desert. Now Cassar Gomelez, her father, hopes to get her to curry favor with a gogol merchant, Abu Nuwas, so that he has enough votes in the Council for the upcoming decision to renegotiate the Cry of Wrath Accords with the Sobornost. Soon, Tawaddud is embroiled in an investigation with a Sobornost envoy into the murder that triggered the need for her father to forge a new alliance in the first place, and forced to confront old secrets that will change Sirr forever. Somewhere else, in a bookshop and on a beach, a young boy is at play. His mother has told him not to talk to strangers, but there has never been anyone here before. Until now. Should he talk to them? == Influences == In the acknowledgments, Rajaniemi cites the influence of "Andy Clark, Douglas Hofstadter, Maurice Leblanc, Jan Potocki and [...] The Arabian Nights." === Self-loops === In the novel, the idea that the mind is a self-loop may have been influenced by the theories of the Professor of Philosophy, Andy Clark, and the book I Am a Strange Loop by Douglas Hofstadter. === Frame stories === The novel uses frame stories rather extensively, a feature also of The Arabian Nights and Jan Potocki's The Manuscript Found in Saragossa. Several characters in Sirr are the namesakes of characters in these two earlier works as well. The events in The Quantum Thief are also retold at least once by Jean le Flambeur in the course of the events in this novel. == Reception == The novel has received generally positive reviews. However, criticisms of the novel still revolve around Rajaniemi's uncompromising "show, don't tell" style. For example, Amy Goldschlager, writing for the Los Angeles Review of Books, suggested that "[a] bit more explication of the physics involved (“surfing the deficit angle”?) would really be helpful, more helpful than the description of the Schrödinger’s Cat problem given earlier in the book".

    Read more →