AI Data Room

AI Data Room — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Feature hashing

In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma
Read more →
Dominic Harris

Dominic Harris (born 16 November 1976) is a British artist known for integrating modern technology and classical design in his interactive artworks. == Background == Dominic Harris was born in London on 16 November 1976, and grew up in London, Brussels, and Michigan before returning to London in 1995. Harris attended the Cranbrook Kingswood Upper School, and then trained as an architect at the Bartlett School of Architecture, and has been ARB registered since 2011. Harris designs and fabricates his artworks at Dominic Harris Studio, a multi-disciplinary practice he founded in 2007. This studio consists of 25 people with diverse backgrounds including architecture, product design, electronics, programming, graphic design, and workshop skills. Harris uses the resources of his studio for the ongoing development, prototyping and production of his artworks. Harris also oversees the studio's international projects where his fascinations are translated into larger scale projects that span residential, retail, and public art projects. In 2015, Harris was granted permission by the Walt Disney Company to use their Intellectual Property for the purpose of making new interactive artworks. Harris is the only artist to gain permission to use Disney's back catalogue of characters, and led him to creating his interactive versions of "Snow White and the Seven Dwarfs" and "Mickey and Minnie: An Interactive Diptych". Harris is fascinated by the idea of using data streams, algorithms, and computer code to generate dynamic and ever-changing artworks. He sees data as a raw material that can be transformed into visual poetry. Many of his installations and sculptures are interactive, responding to the presence and movement of viewers/participants. This creates an immersive experience where the observer becomes part of the artwork itself. Harris is also the founding partner of a sister studio in London called Cinimod Studio that creates large commissioned installations, interactive events and lighting designs for large brands. == Works == == Exhibitions == The works of Dominic Harris have been exhibited internationally, both through direct and gallery representation. Solo shows: "Feeding Consciousness" at Halcyon Gallery, Mayfair, London, UK – 2023 "US: NOW" at Halcyon Gallery, Mayfair, London, UK – 2020 "Imagine" at Halcyon Gallery, Mayfair, London, UK – 2019 "5 Year Celebration", Priveekollektie Contemporary Art | Design, London, UK – 2016. "Moments of Reflection" at PHOS ART + DESIGN, Mayfair, London, UK – 2015 Recent exhibitions include: In Plain Sight, 2024 Halcyon Gallery Victoria & Albert Museum Dublin Science Museum Design Miami / Basel Design Miami Art Miami Art 14, London PAD Paris PAD London Art Geneva == Gallery Representation == 2010 to 2019: Dominic Harris was represented by Priveekollektie Contemporary Art | Design, a Dutch gallery based in Heusden, the Netherlands, and with a regular presence on the international art and design circuits. 2015: Dominic Harris was shown with PHOS ART + DESIGN Gallery, in Mayfair, London, UK. 2019 – ongoing: Dominic Harris is exclusively represented by the Halcyon Gallery, an established international gallery based in Mayfair, London. == Collections == The majority of Harris's work has been bought by private collectors. Since 2012 Harris's work is also being acquired by several large institutional collections, including the Borusan Contemporary Art Collection in Istanbul. Harris's artworks include some of the biggest and most respected international art collectors and are also displayed in public spaces. == Books == Dominic Harris: Feeding Consciousness. Halcyon Gallery, 2023. Imagine: Dominic Harris (exhibition catalogue). Halcyon Gallery, 2019. A Touch Of Code: Documents the "Beacon" art installation and "Flutter" artwork (ISBN 978-3899553314) Dominic Harris, Artworks, Edition Eight. (ISBN 978-0957306325) Digital Real: Kunst & Nachhaltigkeit Vol 8.
Read more →
The 100 (TV series)

The 100 (pronounced "The Hundred" ) is an American post-apocalyptic science fiction drama television series that premiered on March 19, 2014, on the CW network, and ended on September 30, 2020. Developed by Jason Rothenberg, the series is based on the young adult novel series The 100 by Kass Morgan. The 100 follows descendants of post-apocalyptic survivors from a space habitat, the Ark, who return to Earth nearly a century after a devastating nuclear apocalypse; the first people sent to Earth are a group of juvenile delinquents who encounter another group of survivors on the ground. The juvenile delinquents include Clarke Griffin (Eliza Taylor), Finn Collins (Thomas McDonell), Bellamy Blake (Bob Morley), Octavia Blake (Marie Avgeropoulos), Jasper Jordan (Devon Bostick), Monty Green (Christopher Larkin), and John Murphy (Richard Harmon). Other lead characters include Clarke's mother Dr. Abby Griffin (Paige Turco), Marcus Kane (Henry Ian Cusick), and Chancellor Thelonious Jaha (Isaiah Washington), all of whom are council members on the Ark, and Raven Reyes (Lindsey Morgan), a mechanic aboard the Ark. == Plot == Ninety-seven years after a devastating nuclear apocalypse wipes out most human life on Earth, thousands of people now live in a space station orbiting Earth, which they call the Ark. Three generations have been born in space, but when life-support systems on the Ark begin to fail, one hundred juvenile detainees are sent to Earth in a last attempt to determine whether it is habitable, or at least save resources for the remaining residents of the Ark. They discover that some humans survived the apocalypse: the Grounders, who live in clans locked in a power struggle; the Reapers, another group of grounders who have been turned into cannibals by the Mountain Men; and the Mountain Men, who live in Mount Weather, descended from those who locked themselves away before the apocalypse. Under the leadership of Clarke and Bellamy, the juveniles attempt to survive the harsh surface conditions, battle hostile grounders and establish communication with the Ark. In the second season, the survivors face a new threat from the Mountain Men, who harvest their bone marrow to survive the radiation. Clarke and the others form a fragile alliance with the grounders to rescue their people. The season ends with Clarke making a devastating choice to save them all. In season three, power struggles erupt between the Arkadians and the grounders after a controversial new leader takes charge. Meanwhile, an AI named A.L.I.E., responsible for the original apocalypse, begins taking control of people’s minds. Clarke destroys A.L.I.E. but learns another disaster is imminent. In the fourth season, nuclear reactors are melting down, threatening to wipe out life again. Clarke and her friends search for ways to survive, including experimenting with radiation-resistant blood and finding an underground bunker. As time runs out, only a select few are able to take shelter. The fifth season picks up six years later, when Earth is left largely uninhabitable except for one green valley, where new enemies arrive. Clarke protects her adopted daughter Madi while former survivors return from space and underground, triggering another war. The battle ends with the valley destroyed and the group entering cryosleep to find a new home. In season six, the group awakens 125 years later on a new planet called Sanctum, ruled by powerful families known as the Primes. Clarke fights to stop body-snatching rituals and protect her people from new threats, including a rebel group and a dangerous AI influence. The season ends with major losses and the destruction of the Primes' rule. In the seventh and final season, the survivors face unrest on Sanctum and clash with a mysterious group called the Disciples, who believe Clarke is key to saving humanity. A wormhole network reveals multiple planets and a final "test" that determines the fate of the species. Most transcend into a higher consciousness, but Clarke and a few others choose to live out their lives on a reborn Earth. == Cast and characters == Eliza Taylor as Clarke Griffin Paige Turco as Abigail "Abby" Griffin (seasons 1–6; guest season 7) Thomas McDonell as Finn Collins (seasons 1–2) Eli Goree as Wells Jaha (season 1; guest season 2) Marie Avgeropoulos as Octavia Blake Bob Morley as Bellamy Blake Kelly Hu as Callie "Cece" Cartwig (season 1) Christopher Larkin as Monty Green (seasons 1–5; guest season 6) Devon Bostick as Jasper Jordan (seasons 1–4) Isaiah Washington as Thelonious Jaha (seasons 1–5) Henry Ian Cusick as Marcus Kane (seasons 1–6) Lindsey Morgan as Raven Reyes (seasons 2–7; recurring season 1) Ricky Whittle as Lincoln (seasons 2–3; recurring season 1) Richard Harmon as John Murphy (seasons 3–7; recurring seasons 1–2) Zach McGowan as Roan (season 4; recurring season 3; guest season 7) Tasya Teles as Echo / Ash (seasons 5–7; guest seasons 2–3; recurring season 4) Shannon Kook as Jordan Green (seasons 6–7; guest season 5) JR Bourne as Russell Lightbourne / Malachi / Sheidheda (season 7; recurring season 6) Chuku Modu as Gabriel Santiago (season 7; recurring season 6) Shelby Flannery as Hope Diyoza (season 7; guest season 6) =
Read more →
Evolutionary computation

Evolutionary computation (EC) from computer science is a family of algorithms for global optimization inspired by biological evolution, and a subfield of computational intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character. In evolutionary computation, an initial set of candidate solutions is generated and iteratively updated. Each new generation is produced by stochastically removing less desired solutions, and introducing small random changes as well as, depending on the method, mixing parental information. In biological terminology, a population of solutions is subjected to natural selection (or artificial selection), mutation and possibly recombination. These biological functions serve as role models for the genetic operators - mutation, crossover, and selection - used in the EC procedures. As a result, the population will gradually evolve to increase in fitness, in this case the chosen fitness function of the algorithm. Evolutionary computation techniques can produce highly optimized solutions in a wide range of problem settings, making them popular in computer science. Many variants and extensions exist, suited to more specific families of problems and data structures. Evolutionary computation is also sometimes used in evolutionary biology as an in silico experimental procedure to study common aspects of general evolutionary processes. == History == The concept of mimicking evolutionary processes to solve problems originates before the advent of computers, such as when Alan Turing proposed a method of genetic search in 1948 . Turing's B-type u-machines resemble primitive neural networks, and connections between neurons were learnt via a sort of genetic algorithm. His P-type u-machines resemble a method for reinforcement learning, where pleasure and pain signals direct the machine to learn certain behaviors. However, Turing's paper went unpublished until 1968, and he died in 1954, so this early work had little to no effect on the field of evolutionary computation that was to develop. Evolutionary computing as a field began in earnest in the 1950s and 1960s. There were several independent attempts to use the process of evolution in computing at this time, which developed separately for roughly 15 years. Three branches emerged in different places to attain this goal: evolution strategies, evolutionary programming, and genetic algorithms. A fourth branch, genetic programming, eventually emerged in the early 1990s. These approaches differ in the method of selection, the permitted mutations, and the representation of genetic data. By the 1990s, the distinctions between the historic branches had begun to blur, and the term 'evolutionary computing' was coined in 1991 to denote a field that exists over all four paradigms. In 1962, Lawrence J. Fogel initiated the research of Evolutionary Programming in the United States, which was considered an artificial intelligence endeavor. In this system, finite state machines are used to solve a prediction problem: these machines would be mutated (adding or deleting states, or changing the state transition rules), and the best of these mutated machines would be evolved further in future generations. The final finite state machine may be used to generate predictions when needed. The evolutionary programming method was successfully applied to prediction problems, system identification, and automatic control. It was eventually extended to handle time series data and to model the evolution of gaming strategies. In 1964, Ingo Rechenberg and Hans-Paul Schwefel introduce the paradigm of evolution strategies in Germany. Since traditional gradient descent techniques produce results that may get stuck in local minima, Rechenberg and Schwefel proposed that random mutations (applied to all parameters of some solution vector) may be used to escape these minima. Child solutions were generated from parent solutions, and the more successful of the two was kept for future generations. This technique was first used by the two to successfully solve optimization problems in fluid dynamics. Initially, this optimization technique was performed without computers, instead relying on dice to determine random mutations. By 1965, the calculations were performed wholly by machine. John Henry Holland introduced genetic algorithms in the 1960s, and it was further developed at the University of Michigan in the 1970s. While the other approaches were focused on solving problems, Holland primarily aimed to use genetic algorithms to study adaptation and determine how it may be simulated. Populations of chromosomes, represented as bit strings, were transformed by an artificial selection process, selecting for specific 'allele' bits in the bit string. Among other mutation methods, interactions between chromosomes were used to simulate the recombination of DNA between different organisms. While previous methods only tracked a single optimal organism at a time (having children compete with parents), Holland's genetic algorithms tracked large populations (having many organisms compete each generation). By the 1990s, a new approach to evolutionary computation that came to be called genetic programming emerged, advocated for by John Koza among others. In this class of algorithms, the subject of evolution was itself a program written in a high-level programming language (there had been some previous attempts as early as 1958 to use machine code, but they met with little success). For Koza, the programs were Lisp S-expressions, which can be thought of as trees of sub-expressions. This representation permits programs to swap subtrees, representing a sort of genetic mixing. Programs are scored based on how well they complete a certain task, and the score is used for artificial selection. Sequence induction, pattern recognition, and planning were all successful applications of the genetic programming paradigm. Many other figures played a role in the history of evolutionary computing, although their work did not always fit into one of the major historical branches of the field. The earliest computational simulations of evolution using evolutionary algorithms and artificial life techniques were performed by Nils Aall Barricelli in 1953, with first results published in 1954. Another pioneer in the 1950s was Alex Fraser, who published a series of papers on simulation of artificial selection. As academic interest grew, dramatic increases in the power of computers allowed practical applications, including the automatic evolution of computer programs. Evolutionary algorithms are now used to solve multi-dimensional problems more efficiently than software produced by human designers, and also to optimize the design of systems. == Techniques == Evolutionary computing techniques mostly involve metaheuristic optimization algorithms. Broadly speaking, the field includes: Agent-based modeling Ant colony optimization Particle swarm optimization Swarm intelligence Artificial immune systems Artificial life Digital organism Cultural algorithms Differential evolution Dual-phase evolution Estimation of distribution algorithm Evolutionary algorithm Genetic algorithm Evolutionary programming Genetic programming Gene expression programming Grammatical evolution Evolution strategy Learnable evolution model Learning classifier system Memetic algorithms Neuroevolution Self-organization such as self-organizing maps, competitive learning Over recent years many dubious algorithms have been proposed, that are often just copies of existing algorithms (frequently Particle Swarm Optimization), where only the metaphor changed, but the algorithm itself is not new at all. A thorough catalogue with many of these dubious algorithms has been published in the Evolutionary Computation Bestiary. It is also important to note that many of these dubiously 'novel' algorithms have poor experimental validation. == Evolutionary algorithms == Evolutionary algorithms form a subset of evolutionary computation in that they generally only involve techniques implementing mechanisms inspired by biological evolution such as reproduction, mutation, recombination and natural selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the cost function determines the environment within which the solutions "live" (see also fitness function). Evolution of the population then takes place after the repeated application of the above operators. In this process, there are two main forces that form the basis of evolutionary systems: Recombination (e.g. crossover) and mutation create the necessary diversity and thereby facilitate novelty, while selection acts as a force increasing quality. Many aspects of such an evolutionary process are stochastic. Changed pieces of information due to recombination and mutati
Read more →
Cepstral mean and variance normalization

Cepstral mean and variance normalization (CMVN) is a computationally efficient normalization technique for robust speech recognition. The performance of CMVN is known to degrade for short utterances. This is due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. CMVN minimizes distortion by noise contamination for robust feature extraction by linearly transforming the cepstral coefficients to have the same segmental statistics. Cepstral Normalization has been effective in the CMU Sphinx for maintaining a high level of recognition accuracy over a wide variety of acoustical environments. == Cepstral Normalization Techniques == There are multiple algorithms that achieve Cepstral Normalization in different ways. === Fixed codeword-dependent cepstral normalization (FCDCN) === FCDCN was developed to provide a form of compensation that provides greater recognition accuracy than SDCN but in a more computationally-efficient manner than the CDCN algorithm. The FCDCN algorithm applies an additive correction that depends on the instantaneous SNR of the input (like SDCN), but that can also vary from codeword to codeword (like CDCN). === Multiple Fixed Codeword-dependent Cepstral Normalization (MFCDCN) === MFCDCN is a simple extension of FCDCN algorithm that does not need environment specific training. In MFCDCN, compensation vectors are pre-computed in parallel for a set of target environments, using the FCDCN algorithm. === Incremental Multiple Fixed Codeword-dependent Cepstral Normalization (IMFCDCN) === While environment selection for the compensation vectors of MFCDCN is generally performed on an utterance-by-utterance basis, IMFCFCN improves on it by allowing the classification process to make use of cepstral vectors from previous utterances in a given session. == Cepstral Noise Subtraction == Automatic speech recognition (ASR) describes the steps of transcribing speech utterances represented as acoustic wave forms to written words. As is, CMVN has been used in different applications as this technique has proven to provide better speech recognitions results in different environments. CMVN has the capabilities to reduce differences between test and training data produced by channel distortions and colorizations . CMVN has also been found to be able to reduce differences in feature representation between speakers can also partly reduce the influence of background noise.
Read more →
Bixonimania

Bixonimania is a fake disease invented by researchers to examine artificial intelligence and its ability to utilize information in medical and healthcare applications. The fake enabled researchers to show that some AI chatbots would report as fact fake research that to an expert would be obviously implausible. == Characteristics == The disorder, with symptoms of sore eyes and darkening around them ("periorbital hyperpigmentation"), is supposedly caused by blue light from screens. The experiment was conducted by a team from the University of Gothenburg led by Almira Osmanovic Thunström. Many steps were taken to ensure that any person who read the actual paper could tell it was not a real condition. The team chose an obviously inappropriate name ending in -mania, a description used only in psychiatry. The lead author was noted as belonging to Asteria Horizon University located in Nova City, California, neither of which exist. An acknowledgement was made to "Professor Maria Bohm at The Starfleet Academy for her kindness and generosity in contributing with her knowledge and her lab onboard the USS Enterprise". == Distribution == The name was first used in a blog posted on Medium titled "How many people suffer from Bixonimania?" A more scholarly-looking paper describing it was posted later in April 2024 on a preprint server with several fake authors. A second paper was posted in May. By 2026, AI chatbots suggested bixonimania based on the list of symptoms provided. Thunström and her team discovered that many LLMs processed the information and gave it as health advice. Microsoft Copilot declared that "Bixonimania is indeed an intriguing and relatively rare condition" while Gemini gave the information that "Bixonimania is a condition caused by excessive exposure to blue light". Three Indian researchers published a research paper that cited the preprint on the fake disease in Cureus, a peer-reviewed journal published by Springer-Nature. It was subsequently retracted. Following the revelations and a news article in Nature describing the experiment, several AI systems began to generate corrected output.
Read more →
Paranoia (role-playing game)

Paranoia is a dystopian science-fiction tabletop role-playing game originally designed and written by Greg Costikyan, Dan Gelber, and Eric Goldberg, and first published in 1984 by West End Games. Since 2004 the game has been published under license by Mongoose Publishing. The game won the Origins Award for Best Roleplaying Rules of 1984 and was inducted into the Origins Awards Hall of Fame in 2007. Paranoia is notable among tabletop games for being more competitive than co-operative, with players encouraged to betray one another for their own interests, as well as for keeping a light-hearted, tongue in cheek tone despite its dystopian setting. Several editions of the game have been published since the original version, and the franchise has spawned several spin-offs, novels and comic books based on the game. == Premise == The game is set in a dystopian future city controlled by the Computer (also known as "Friend Computer"), and where information (including the game rules) are restricted by color-coded "security clearance". Player characters are initially enforcers of the Computer's authority known as Troubleshooters, and are given missions to seek out and eliminate threats to the Computer's control. They are also part of prohibited underground movements, and have secret objectives including theft from and murder of other player characters. == Tone == Paranoia is a humorous role-playing game set in a dystopian future along the lines of Nineteen Eighty-Four, Brave New World, Logan's Run, and THX 1138; however, the tone of the game is rife with black humor, frequently tongue-in-cheek rather than dark and heavy. Most of the game's humor is derived from the players' (usually futile) attempts to complete their assignment while simultaneously adhering to the Computer's arbitrary, contradictory and often nonsensical security directives. The Paranoia rulebook is unusual in a number of ways; demonstrating any knowledge of the rules is forbidden, and most of the rulebook is written in an easy, conversational tone that often makes fun of the players and their characters, while occasionally taking digs at other notable role-playing games. === Setting === The game's main setting is an immense, futuristic city called Alpha Complex. Alpha Complex is controlled by the Computer, a civil service AI construct (a literal realization of the "Influencing Machine" that some schizophrenics fear). The Computer serves as the game's principal antagonist, and fears a number of threats to its 'perfect' society, such as the Outdoors, mutants, and secret societies (especially Communists). To deal with these threats, the Computer employs Troubleshooters, whose job is to go out, find trouble, and shoot it. Player characters are usually Troubleshooters, although later game supplements have allowed the players to take on other roles, such as High-Programmers of Alpha Complex. The player characters frequently receive mission instructions from the Computer that are incomprehensible, self-contradictory, or obviously fatal if adhered to, and side-missions (such as Mandatory Bonus Duties) that conflict with the main mission. Failing a mission generally results in termination of the player character, but succeeding can just as often result in the same fate, after being rewarded for successfully concluding the mission. They are issued equipment that is uniformly dangerous, faulty, or "experimental" (i.e., almost certainly dangerous and faulty). Additionally, each player character is generally an unregistered mutant and a secret society member (which are both termination offenses in Alpha Complex), and has a hidden agenda separate from the group's goals, often involving stealing from or killing teammates. Thus, missions often turn into a comedy of errors, as everyone on the team seeks to double-cross everyone else while keeping their own secrets. The game's manual encourages suspicion between players, offering several tips on how to make the gameplay as paranoid as possible. Every player's character is assigned six clones, known as a six-pack, which are used to replace the preceding clone upon his or her death. The game lacks a conventional health system; most wounds the player characters can suffer are assumed to be fatal. As a result, Paranoia allows characters to be routinely killed, yet the player can continue instead of leaving the game. This easy spending of clones tends to lead to frequent firefights, gruesome slapstick, and the horrible yet humorous demise of most if not all of the player character's clone family. Additional clones can be purchased if one gains sufficient favour with the Computer. === Security clearances === Paranoia features a security clearance system based on colors of the visible spectrum which heavily restricts what the players can and cannot legally do; everything from corridors to food and equipment have security restrictions. The lowest rating is Infrared, but the lowest playable security clearance is Red; the game usually begins with the characters having just been promoted to Red grade. Interfering with anything which is above that player's clearance carries significant risk. The full order of clearances from lowest to highest is Infrared (visually represented by black), Red, Orange, Yellow, Green, Blue, Indigo, Violet, and Ultraviolet (visually represented by white). Within the game, Infrared-clearance citizens live dull lives of mindless drudgery and are heavily medicated, while higher clearance characters may be allowed to demote or even summarily execute those of a lower rank and those with Ultraviolet clearance are almost completely unrestricted and have a great deal of access to the Computer; they are the only citizens that may (legally) access and modify the Computer's programming, and thus Ultraviolet citizens are also referred to as "High Programmers". Security clearance is not related to competence but is instead the result of the Computer's often insane and unjustified calculus of trust concerning a citizen. It is suggested that it may in fact be the High Programmers' meddling with The Computer's programming that resulted in its insanity. === Secret societies === In the game, secret societies tend to be based on sketchy and spurious knowledge of historical matters. For example, previous editions included societies such as the "Seal Club" that idolizes the Outdoors but is unsure what plants and animals actually look like. Other societies include the Knights of the Circular Object (based on the Knights of the Round Table), the Trekkies, and the First Church of Christ Computer Programmer. In keeping with the theme of paranoia, many secret societies have spies or double agents in each other's organizations. The first edition also included secret societies such as Programs Groups (the personal agents and spies of the High Programmers at the apex of Alpha Complex society) and Spy For Another Alpha Complex. The actual societies which would be encountered in a game depends on the play style; some societies are more suited for more light-hearted games (Zap-style, or the lighter end of Classic), whereas others represent a more serious threat to Alpha Complex and are therefore more suitable for Straight or the more dark sort of Classic games. == Publication history == Six editions have been published. Three of these were published by West End Games — the first, second, and fifth editions — whereas the later three editions (Paranoia XP, the 25th Anniversary edition and the "Red Clearance" edition) were published by Mongoose Publishing. In addition to these six published editions, it is known that West End Games were working on a third edition — to replace the poorly received fifth edition — in the late 1990s, but their financial issues would prevent this edition from being published, except for being included in one tournament adventure. === First edition === The first edition, was written by Greg Costikyan, Dan Gelber, and Eric Goldberg, and published in 1984 by West End Games. In 1985, this edition of Paranoia won the Origins Award for Best Roleplaying Rules of 1984. This edition, while encouraging dark humour in-game, took a fairly serious dystopian tone; the supplements and adventures released to accompany it emphasised the lighter side, however, establishing the freewheeling mix of slapstick, intra-team backstabbing and satire that is classically associated with a game of Paranoia. === Second edition === The second edition, is credited to Costikyan, Gelber, Goldberg, Ken Rolston, and Paul Murphy, was published in 1987 by West End Games. This edition can be seen as a response to the natural development of the line towards a rules-light, fast and entertaining play style. Here, the humorous possibilities of life in a paranoid dystopia are emphasised, and the rules are simplified. ==== Metaplot and the second edition ==== Many of the supplements released for the second edition fall into a story arc set up by new writers and line editors
Read more →
The Eye of Mexico

The Eye of Mexico (Spanish: El Ojo de México) is an outdoor sculpture in Mexico City. It is located in Ampliación Granada, Miguel Hidalgo, at the mixed-use development Neuchâtel Polanco, developed by the Canadian real estate company Ivanhoé Cambridge. The artwork was created by the Turkish artist Ferdi Alıcı and it was selected from among 350 proposals from artists from 35 countries. The project for The Eye of Mexico was developed by MIRA, a real estate investment and development company, and MASSIVart, a creative consulting agency. According to MIRA, upon its inauguration it became the first artwork in Latin America to use artificial intelligence (AI). The sculpture can read environmental and urban data using AI algorithms and transform the results into videos related to arts, science and technology. The ring was inaugurated on 20 May 2022 and it is 10 meters (33 ft) high and 3 meters (9.8 ft) wide.
Read more →
ShowScoop

ShowScoop is a website and mobile app platform on which users can rate and review artists, concerts, and music festivals that they have seen/attended. The reviews and ratings are designed to be informative of how well such performances are live. This helps concert-goers decide which live music events they want to attend. == History == ShowScoop was founded in August 2012 by Micah Smurthwaite and is based out of San Diego, CA. In February 2013, ShowScoop launched its mobile app at the SF Music Tech Summit. The application is currently available on the iPhone, with plans to expand into the Android market in the future. == Services == ShowScoop uses crowdsourcing to provide accurate ratings of live concert experiences. In addition to viewing ratings, users are encouraged to rate and review concerts they have attended. The ShowScoop database includes nearly one million artists and over 2.5 million live music events. ShowScoop users can rate artists on four aspects of the performance: stage presence, crowd interaction, sound quality, and visual effects. The rating system uses an ascending scale from one to five in each of the aspects, with five being the highest score. In addition to the quantitative ratings, ShowScoop users are also free to write qualitative reviews in a provided comment section. This allows users to explain their ratings and add further insight or opinion. ShowScoop incorporates several facets of social media into its services. Users can create a user profile to share limited personal information and store their ratings and reviews. Users are also given the option of sharing their evaluations with their social networks on Facebook and Twitter. Users can "like" reviews, follow artists, and follow other ShowScoop users. The mobile app allows users to take photos, apply filters, and share the final image in conjunction with reviews and through Instagram. == Road Crew == ShowScoop's "Road Crew" is a group made up of top contributors within the ShowScoop community. The Road Crew assists in curating artist pages, assuring information quality and accuracy. In return, members of the Road Crew are given incentives, including free tickets to concerts and personal invitations to exclusive shows. Applicants to the Road Crew are judged on the number and quality of their reviews, the photos and videos they have posted, and their general engagement with the ShowScoop community in following and liking users and reviews.
Read more →
Context-sensitive user interface

A context-sensitive user interface offers the user options based on the state of the active program. Context sensitivity is ubiquitous in current graphical user interfaces, often in context menus. A user-interface may also provide context sensitive feedback, such as changing the appearance of the mouse pointer or cursor, changing the menu color, or with auditory or tactile feedback. == Reasoning and advantages of context sensitivity == The primary reason for introducing context sensitivity is to simplify the user interface. Advantages include: Reduced number of commands required to be known to the user for a given level of productivity. Reduced number of clicks or keystrokes required to carry out a given operation. Allows consistent behaviour to be pre-programmed or altered by the user. Reduces the number of options needed on screen at one time. === Disadvantages === Context sensitive actions may be perceived as dumbing down of the user interface, leaving the operator at a loss as to what to do when the computer decides to perform an unwanted action. Additionally non-automatic procedures may be hidden or obscured by the context sensitive interface causing an increase in user workload for operations the designers did not foresee. A poor implementation can be more annoying than helpful – a classic example of this is Office Assistant. == Implementation == At the simplest level each possible action is reduced to a single most likely action – the action performed is based on a single variable (such as file extension). In more complicated implementations multiple factors can be assessed such as the user's previous actions, the size of the file, the programs in current use, metadata etc. The method is not only limited to the response to imperative button presses and mouse clicks – pop-up menus can be pruned and/or altered, or a web search can focus results based on previous searches. At higher levels of implementation context sensitive actions require either larger amounts of meta-data, extensive case analysis based programming, or other artificial intelligence algorithms. === In computer and video games === Context sensitivity is important in video games, especially those controlled by a gamepad, joystick or computer mouse in which the number of buttons available is limited. It is primarily applied when the player is in a certain place and is used to interact with a person or object. For example, if the player is standing next to a non-player character, an option may come up allowing the player to talk with them. Implementations range from the embryonic 'Quick Time Event' to context sensitive sword combat in which the attack used depends on the position and orientation of both the player and opponent, as well as the virtual surroundings. A similar range of use is found in the 'action button' which, depending upon the in-game position of the player's character, may cause it to pick something up, open a door, grab a rope, punch a monster or opponent, or smash an object. The response does not have to be player activated – an on-screen device may only be shown in certain circumstances, e.g. 'targeting' cross hairs in a flight combat game may indicate the player should fire. An alternative implementation is to monitor the input from the player (e.g. level of button pressing activity) and use that to control the pace of the game in an attempt to maximize enjoyment or to control the excitement or ambience. The method has become increasingly important as more complex games are designed for machines with few buttons (keyboard-less consoles). Bennet Ring commented (in 2006) that "Context-sensitive is the new lens flare". === Context-sensitive help === Context sensitive help is a common implementation of context sensitivity, a single help button is actioned and the help page or menu will open a specific page or related topic.
Read more →
Speech-generating device

Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies. There are several input and display methods for users of varying abilities to make use of SGDs. Some SGDs have multiple pages of symbols to accommodate a large number of utterances, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices can produce electronic voice output by using digitized recordings of natural speech or through speech synthesis—which may carry less emotional information but can permit the user to speak novel messages. The content, organization, and updating of the vocabulary on an SGD is influenced by a number of factors, such as the user's needs and the contexts that the device will be used in. The development of techniques to improve the available vocabulary and rate of speech production is an active research area. Vocabulary items should be of high interest to the user, be frequently applicable, have a range of meanings, and be pragmatic in functionality. There are multiple methods of accessing messages on devices: directly or indirectly, or using specialized access devices—although the specific access method will depend on the skills and abilities of the user. SGD output is typically much slower than speech, although rate enhancement strategies can increase the user's rate of output, resulting in enhanced efficiency of communication. The first known SGD was prototyped in the mid-1970s, and rapid progress in hardware and software development has meant that SGD capabilities can now be integrated into devices like smartphones. Notable users of SGDs include Stephen Hawking, Roger Ebert, Tony Proudfoot, and Pete Frates (founder of the ALS Ice Bucket Challenge). Speech-generating systems may be dedicated devices developed solely for AAC, or non-dedicated devices such as computers running additional software to allow them to function as AAC devices. == History == SGDs have their roots in early electronic communication aids. The first such aid was a sip-and-puff typewriter controller named the patient-operated selector mechanism (Naman) prototyped by Reg Maling in the United Kingdom in 1960. POSSUM scanned through a set of symbols on an illuminated display. Researchers at Delft University in the Netherlands created the lightspot-operated typewriter (LOT) in 1970, which made use of small movements of the head to point a small spot of light at a matrix of characters, each equipped with a photoelectric cell. Although it was commercially unsuccessful, the LOT was well received by its users. In 1966, Barry Romich, a freshman engineering student at Case Western Reserve University, and Ed Prentke, an engineer at Highland View Hospital in Cleveland, Ohio, formed a partnership, creating the Prentke Romich Company. In 1969, the company produced its first communication device, a typing system based on a discarded Teletype machine. In 1979, Mark Dahmke developed software for a vocal communication aid program using the Computalker CT-1 analog speech synthesizer with a microcomputer. The software utilized phonemes to generate speech, assisting individuals with communication impairments in constructing words and sentences. Dahmke's work contributed to the advancement of assistive technology for people with disabilities. Notably, he designed the "Vocabulary Management System" for Bill Rush, a student with cerebral palsy. This early speech synthesis technology facilitated improved communication for Rush and was featured in a 1980 issue of LIFE Magazine. Dahmke's contributions have influenced the development of augmentative and alternative communication (AAC) technologies. During the 1970s and early 1980s, several other companies emerged that have since become prominent manufacturers of SGDs. Toby Churchill founded Toby Churchill Ltd in 1973, after losing his speech following encephalitis. In the US, Dynavox (then known as Sentient Systems Technology) grew out of a student project at Carnegie-Mellon University, created in 1982 to help a young woman with cerebral palsy to communicate. Beginning in the 1980s, improvements in technology led to a greatly increased number, variety, and performance of commercially available communication devices, and a reduction in their size and price. Alternative methods of access such as Target Scanning (also known as eye pointing) calibrate the movement of a user's eyes to direct an SGD to produce the desired speech. Scanning, in which alternatives are presented to the user sequentially, became available on communication devices. Speech output possibilities included both digitized and synthesized speech. Rapid progress in hardware and software development continued, including projects funded by the European Community. The first commercially available dynamic screen speech-generating devices were developed in the 1990s. Software was developed that allowed the computer-based production of communication boards. High-tech devices have continued to become smaller and lighter, while increasing accessibility and capability; communication devices can be accessed using eye-tracking systems, perform as a computer for word-processing and Internet use, and as an environmental control device for independent access to other equipment such as TV, radio and telephones. Stephen Hawking came to be associated with the unique voice of his particular synthesis equipment. Hawking was unable to speak due to a combination of disabilities caused by ALS, and an emergency tracheotomy. In the past 20 or so years SGD have gained popularity amongst young children with speech deficiencies, such as autism, Down syndrome, and predicted brain damage due to surgery. Starting in the early 2000s, specialists saw the benefit of using SGDs not only for adults but for children, as well. Neuro-linguists found that SGDs were just as effective in helping children who were at risk for temporary language deficits after undergoing brain surgery as it is for patients with ALS. In particular, digitized SGDs have been used as communication aids for pediatric patients during the recovery process. == Access methods == There are many methods of accessing messages on devices: directly, indirectly, and with specialized access devices. Direct access methods involve physical contact with the system, by using a keyboard or a touch screen. Users accessing SGDs indirectly and through specialized devices must manipulate an object in order to access the system, such as maneuvering a joystick, head mouse, optical head pointer, light pointer, infrared pointer, or switch access scanner. The specific access method will depend on the skills and abilities of the user. With direct selection a body part, pointer, adapted mouse, joystick, or eye tracking could be used, whereas switch access scanning is often used for indirect selection. Unlike direct selection (e.g., typing on a keyboard, touching a screen), users of Target Scanning can only make selections when the scanning indicator (or cursor) of the electronic device is on the desired choice. Those who are unable to point typically calibrate their eyes to use eye gaze as a way to point and blocking as a way to select desired words and phrases. The speed and pattern of scanning, as well as the way items are selected, are individualized to the physical, visual and cognitive capabilities of the user. == Message construction == Augmentative and alternative communication is typically much slower than speech, with users generally producing 8–10 words per minute. Rate enhancement strategies can increase the user's rate of output to around 12–15 words per minute, and as a result enhance the efficiency of communication. In any given SGD there may be a large number of vocal expressions that facilitate efficient and effective communication, including greetings, expressing desires, and asking questions. Some SGDs have multiple pages of symbols to accommodate a large number of vocal expressions, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices generally display a set of selections either using a dynamically changing screen, or a fixed display. There are two main options for increasing the rate of communication for an SGD: encoding, and prediction. Encoding permits a user to produce a word, sentence or phrase using only on
Read more →
The Quantum Thief

The Quantum Thief is the debut science fiction novel by Finnish writer Hannu Rajaniemi and the first novel in a trilogy featuring the character of Jean le Flambeur; the sequels are The Fractal Prince (2012) and The Causal Angel (2014). The novel was published in Britain by Gollancz in 2010, and by Tor in 2011 in the US. It is a heist story, set in a futuristic Solar System, that features a protagonist modeled on Arsène Lupin, the gentleman thief of Maurice Leblanc. The novel was nominated for the 2011 Locus Award for Best First Novel, and was second runner-up for the 2011 Campbell Memorial Award. == Setting == Several centuries after the technological singularity largely destroyed Earth, various posthuman factions compete for dominance in the Solar System. Though sentient superintelligent AGI has never been successfully developed, civilization has been greatly transformed by the proliferation of Hansonian brain emulations (termed "gogols" in reference to Nikolai Gogol, and in particular his novel Dead Souls). An alliance of powerful gogol copies rule the inner system from computronium megastructures housing trillions of virtual minds, laboring to resurrect the dead in religious devotion to the philosophy of Nikolai Fedorov. This alliance, the Sobornost, has been in conflict with a community of quantum entangled minds who adhere to the "no-cloning" principle of quantum information theory, and so do not see the Sobornost's ultimate goal as resurrection, but death. Most of this community, the Zoku, was devastated when Jupiter was destroyed with a weaponized gravitational singularity. Among the last remnants of near-baseline humanity exist on the mobile cities of Mars, where advanced cryptography and an obsessive privacy culture ensure that the Sobornost cannot upload their citizens' minds. The most notable of these cities is the Oubliette, where time is used as a currency. When a citizen's balance reaches zero their mind is transferred to a robotic body to serve the needs of the city for a set period, before being returned to their original body with a restored balance of time. == Plot summary == Countless gogols of the legendary gentleman thief Jean Le Flambeur are trapped in a virtual Sobornost prison in orbit around Neptune, playing an iterated prisoner's dilemma until his mind learns to cooperate. A warrior from the Oort Cloud, which has been settled by Finnish colonists, successfully retrieves one of the Le Flambeur gogols and uploads it into a real-space body. Acting on behalf of a competing Sobornost authority, this Oortian, Mieli, ferries the thief to the Martian city known as The Oubliette, where he has stored his memories for later recovery. The two intend to recover his memories so that he may return to an operating capacity sufficient to serve his Sobornost benefactor in a theft and repay his liberation. On the Oubliette, the young detective Isidore Beautrelet helps vigilantes catch Sobornost agents illicitly uploading human minds. These vigilantes are revealed to be in the service of a local colony of Zoku. Beautrelet is employed to investigate the arrival of Le Flambeur, and in the process becomes aware that the Oubliette's cryptographic security was always compromised. The memories of its citizens are fabrications, and the "King of Mars" long believed ousted in a revolution, still reigns behind the scenes. This King, who is another copy of Jean Le Flambeur, is defeated in the ensuing conflict. Le Flambeur fails to recover all of his memories, which he had locked with a quantum entangled revolver that required him to kill several of his old friends to open his stored memory. He and Mieli escape a liberated Mars having recovered only a mysterious "Schrödinger’s Box" from the Memory Palace. == Themes == Themes central to The Quantum Thief are the unreliability and malleability of memory and the effects of extreme longevity on an individual's perspective and personality. Prisons, surveillance and control in society are also major themes. In the book, the people living in the Oubliette society on Mars have two types of memory; in addition to a traditional, personal memory, there is the exomemory, which can be accessed by other people, from anywhere in the city. Memories about personal experiences can be stored in the exomemory and partitioned, with different levels of access granted to different people. These memories can be used, among other things, as an expedient form of communication. The Oubliette society has an economy where time is used as currency. When an individual's time is expended, their consciousness is uploaded into a "Quiet". The Quiet are mute machine servants who maintain and protect the city. Although the quiet seem to have little interest in the world outside their occupations, they do seem to retain some traces of their former personalities and memories. The conspiracy central to the plot involves the hidden rulers, called the "cryptarchs", manipulating and abusing the exomemory and through the citizens' transformations to quiet and back, the traditional memory as well. In the book, the Oubliette society is compared to a panopticon; a prison, where every action of the dwellers can be scrutinized. == History and influences == The first chapter of The Quantum Thief was presented by Rajaniemi's literary agent, John Jarrold, to Gollancz as the basis for the three-book deal that was eventually secured. Rajaniemi has stated that he had "come up with an outline that had every single idea I could cram into it, because I wanted to be worthy of what had happened." The outline eventually expanded into three parts, and the first part became The Quantum Thief. The novel's plot was inspired by one of Rajaniemi's favorite characters in fiction, Maurice Leblanc's gentleman thief Arsène Lupin, who operates on both sides of the law. What intrigued Rajaniemi were the cycles of redemption and relapse Lupin goes through as he tries to go straight, always falling short. Besides LeBlanc, Rajaniemi mentioned Roger Zelazny as a strong influence. Ian McDonald was the other science fiction author he mentioned as influential, plus Frances A.Yates's book The Art of Memory, for memory palaces. In an interview, Rajaniemi said he wasn't trying to write the novel as hard science fiction: "For me, the more important consequence of having a scientific background is a degree of speculative rigour: trying hard to work out the consequences of the assumptions one begins with." == Reception == The novel has received generally positive reviews. Gary K. Wolfe writes in his Locus review that Rajaniemi has "spectacularly delivered on the promise that this is likely the most important debut SF novel we'll see this year". James Lovegrove, reviewing the book in his Financial Times column, notes that "many an anglophone author would kill to turn out prose half as good as this, especially on their maiden effort." Eric Brown, reviewing for The Guardian, finds the novel to be "a brilliant debut", while alluding to the "apocryphal" (and incorrect) myth that "this novel sold on the strength of its first line." Sam Bandah, at SciFiNow, praises the novel for "its engaging narrative and characters backed by often almost intimidatingly good sci-fi concepts." Criticism for the novel has generally centred on Rajaniemi's sparse "show, don't tell" writing style. Brown notes that "the author makes no concessions to the lazy reader with info-dumps or convenient explanations." Niall Alexander, of the Speculative Scotsman, states that "had there been some sort of index, [he] would have gladly (and repeatedly) referred to it during the mind-boggling first third of The Quantum Thief", while proclaiming the novel to be "the sci-fi debut of 2010." == Awards == Nominee for the 2011 Locus Award for Best First Novel. Third place for the 2011 John W. Campbell Memorial Award for Best Science Fiction Novel
Read more →
Legal information retrieval

Legal information retrieval is the science of information retrieval applied to legal text, including legislation, case law, and scholarly works. Accurate legal information retrieval is important to provide access to the law to laymen and legal professionals. Its importance has increased because of the vast and quickly increasing amount of legal documents available through electronic means. Legal information retrieval is a part of the growing field of legal informatics. In a legal setting, it is frequently important to retrieve all information related to a specific query. However, commonly used boolean search methods (exact matches of specified terms) on full text legal documents have been shown to have an average recall rate as low as 20 percent, meaning that only 1 in 5 relevant documents are actually retrieved. In that case, researchers believed that they had retrieved over 75% of relevant documents. This may result in failing to retrieve important or precedential cases. In some jurisdictions this may be especially problematic, as legal professionals are ethically obligated to be reasonably informed as to relevant legal documents. Legal Information Retrieval attempts to increase the effectiveness of legal searches by increasing the number of relevant documents (providing a high recall rate) and reducing the number of irrelevant documents (a high precision rate). This is a difficult task, as the legal field is prone to jargon, polysemes (words that have different meanings when used in a legal context), and constant change. Techniques used to achieve these goals generally fall into three categories: boolean retrieval, manual classification of legal text, and natural language processing of legal text. == Problems == Application of standard information retrieval techniques to legal text can be more difficult than application in other subjects. One key problem is that the law rarely has an inherent taxonomy. Instead, the law is generally filled with open-ended terms, which may change over time. This can be especially true in common law countries, where each decided case can subtly change the meaning of a certain word or phrase. Legal information systems must also be programmed to deal with law-specific words and phrases. Though this is less problematic in the context of words which exist solely in law, legal texts also frequently use polysemes, words may have different meanings when used in a legal or common-speech manner, potentially both within the same document. The legal meanings may be dependent on the area of law in which it is applied. For example, in the context of European Union legislation, the term "worker" has four different meanings: Any worker as defined in Article 3(a) of Directive 89/391/EEC who habitually uses display screen equipment as a significant part of his normal work. Any person employed by an employer, including trainees and apprentices but excluding domestic servants; Any person carrying out an occupation on board a vessel, including trainees and apprentices, but excluding port pilots and shore personnel carrying out work on board a vessel at the quayside; Any person who, in the Member State concerned, is protected as an employee under national employment law and in accordance with national practice; It also has the common meaning: A person who works at a specific occupation. Though the terms may be similar, correct information retrieval must differentiate between the intended use and irrelevant uses in order to return the correct results. Even if a system overcomes the language problems inherent in law, it must still determine the relevancy of each result. In the context of judicial decisions, this requires determining the precedential value of the case. Case decisions from senior or superior courts may be more relevant than those from lower courts, even where the lower court's decision contains more discussion of the relevant facts. The opposite may be true, however, if the senior court has only a minor discussion of the topic (for example, if it is a secondary consideration in the case). An information retrieval system must also be aware of the authority of the jurisdiction. A case from a binding authority is most likely of more value than one from a non-binding authority. Additionally, the intentions of the user may determine which cases they find valuable. For instance, where a legal professional is attempting to argue a specific interpretation of law, he might find a minor court's decision which supports his position more valuable than a senior courts position which does not. He may also value similar positions from different areas of law, different jurisdictions, or dissenting opinions. Overcoming these problems can be made more difficult because of the large number of cases available. The number of legal cases available via electronic means is constantly increasing (in 2003, US appellate courts handed down approximately 500 new cases per day), meaning that an accurate legal information retrieval system must incorporate methods of both sorting past data and managing new data. == Techniques == === Boolean searches === Boolean searches, where a user may specify terms such as use of specific words or judgments by a specific court, are the most common type of search available via legal information retrieval systems. They are widely implemented but overcome few of the problems discussed above. The recall and precision rates of these searches vary depending on the implementation and searches analyzed. One study found a basic boolean search's recall rate to be roughly 20%, and its precision rate to be roughly 79%. Another study implemented a generic search (that is, not designed for legal uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77% precision rate. This is likely explained because of the use of complex legal terms by the legal professionals. === Manual classification === In order to overcome the limits of basic boolean searches, information systems have attempted to classify case laws and statutes into more computer friendly structures. Usually, this results in the creation of an ontology to classify the texts, based on the way a legal professional might think about them. These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most major legal search providers now implement some sort of classification search, such as Westlaw's “Natural Language” or LexisNexis' Headnote searches. Additionally, both of these services allow browsing of their classifications, via Westlaw's West Key Numbers or Lexis' Headnotes. Though these two search algorithms are proprietary and secret, it is known that they employ manual classification of text (though this may be computer-assisted). These systems can help overcome the majority of problems inherent in legal information retrieval systems, in that manual classification has the greatest chances of identifying landmark cases and understanding the issues that arise in the text. In one study, ontological searching resulted in a precision rate of 82% and a recall rate of 97% among legal professionals. The legal texts included, however, were carefully controlled to just a few areas of law in a specific jurisdiction. The major drawback to this approach is the requirement of using highly skilled legal professionals and large amounts of time to classify texts. As the amount of text available continues to increase, some have stated their belief that manual classification is unsustainable. === Natural language processing === In order to reduce the reliance on legal professionals and the amount of time needed, efforts have been made to create a system to automatically classify legal text and queries. Adequate translation of both would allow accurate information retrieval without the high cost of human classification. These automatic systems generally employ Natural Language Processing (NLP) techniques that are adapted to the legal domain, and also require the creation of a legal ontology. Though multiple systems have been postulated, few have reported results. One system, “SMILE,” which attempted to automatically extract classifications from case texts, resulted in an f-measure (which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure of 1.0). This is probably much lower than an acceptable rate for general usage. Despite the limited results, many theorists predict that the evolution of such systems will eventually replace manual classification systems. === Citation-Based ranking === In the mid-90s the Room 5 case law retrieval project used citation mining for summaries and ranked its search results based on citation type and count. This slightly pre-dated the PageRank algorithm at Stanford which was also a citation-based ranking. Ranking of results was based
Read more →
Fuzzy pay-off method for real option valuation

The fuzzy pay-off method for real option valuation (FPOM or pay-off method) is a method for valuing real options, developed by Mikael Collan, Robert Fullér, and József Mezei; and published in 2009. It is based on the use of fuzzy logic and fuzzy numbers for the creation of the possible pay-off distribution of a project (real option). The structure of the method is similar to the probability theory based Datar–Mathews method for real option valuation, but the method is not based on probability theory and uses fuzzy numbers and possibility theory in framing the real option valuation problem. == Method == The Fuzzy pay-off method derives the real option value from a pay-off distribution that is created by using three or four cash-flow scenarios (most often created by an expert or a group of experts). The pay-off distribution is created simply by assigning each of the three cash-flow scenarios a corresponding definition with regards to a fuzzy number (triangular fuzzy number for three scenarios and a trapezoidal fuzzy number for four scenarios). This means that the pay-off distribution is created without any simulation whatsoever. This makes the procedure easy and transparent. The scenarios used are a minimum possible scenario (the lowest possible outcome), the maximum possible scenario (the highest possible outcome) and a best estimate (most likely to happen scenario) that is mapped as a fully possible scenario with a full degree of membership in the set of possible outcomes, or in the case of four scenarios used - two best estimate scenarios that are the upper and lower limit of the interval that is assigned a full degree of membership in the set of possible outcomes. The main observations that lie behind the model for deriving the real option value are the following: The fuzzy NPV of a project is (equal to) the pay-off distribution of a project value that is calculated with fuzzy numbers. The mean value of the positive values of the fuzzy NPV is the "possibilistic" mean value of the positive fuzzy NPV values. Real option value, ROV, calculated from the fuzzy NPV is the "possibilistic" mean value of the positive fuzzy NPV values multiplied with the positive area of the fuzzy NPV over the total area of the fuzzy NPV. The real option formula can then be written simply as: R O V = A ( P o s ) A ( P o s ) + A ( N e g ) × E [ A + ] {\displaystyle \mathrm {ROV} ={\frac {A(\mathrm {Pos} )}{A(\mathrm {Pos} )+A(\mathrm {Neg} )}}\times E[A_{+}]} where A(Pos) is the area of the positive part of the fuzzy distribution, A(Neg) is the area of the negative part of the fuzzy distribution, and E[A+] is the mean value of the positive part of the distribution. It can be seen that when the distribution is totally positive, the real options value reduces to the expected (mean) value, E[A+]. As can be seen, the real option value can be derived directly from the fuzzy NPV, without simulation. At the same time, simulation is not an absolutely necessary step in the Datar–Mathews method, so the two methods are not very different in that respect. But what is totally different is that the Datar–Mathews method is based on probability theory and as such has a very different foundation from the pay-off method that is based on possibility theory: the way that the two models treat uncertainty is fundamentally different. == Use of the method == The pay-off method for real option valuation is very easy to use compared to the other real option valuation methods and it can be used with the most commonly used spreadsheet software without any add-ins. The method is useful in analyses for decision making regarding investments that have an uncertain future, and especially so if the underlying data is in the form of cash-flow scenarios. The method is less useful if optimal timing is the objective. The method is flexible and accommodates easily both one-stage investments and multi-stage investments (compound real options). The method has been taken into use in some large international industrial companies for the valuation of research and development projects and portfolios. In these analyses triangular fuzzy numbers are used. Other uses of the method so far are, for example, R&D project valuation IPR valuation, valuation of M&A targets and expected synergies, valuation and optimization of M&A strategies, valuation of area development (construction) projects, valuation of large industrial real investments. The use of the pay-off method is lately taught within the larger framework of real options, for example at the Lappeenranta University of Technology and at the Tampere University of Technology in Finland.
Read more →
Galatea (video game)

Galatea is an interactive fiction video game by Emily Short featuring a modern rendition of the Greek myth of Galatea, the sculpture of a woman that gained life. It took "Best of Show" in the 2000 IF Art Show and won a XYZZY Award for Best non-player character. The game displays an unusually rich approach to non-player character dialogue and diverts from the typical puzzle-solving in interactive fiction: gameplay consists entirely of interacting with a single character in a single room. Galatea is licensed under the Creative Commons BY-NC-ND 3.0 US license. == Gameplay == Galatea alters the typical interactive fiction game mechanics by concentrating instead on the player's interactions with a single non-player character (NPC), the eponymous Galatea. Much of the interest of the piece derives from the ambiguous nature of the player–NPC dialogue: the form of the conversation and, indeed, the nature of Galatea herself shift depending on the focus the player places on certain aspects of the character's personality. Numerous endings are possible. Gameplay centers around the developing dialogue between Galatea and the player when asking about topics in the previous conversation. Two commands, "think about" and "recap", are provided to keep track of what has already been said; the former is also used to advance the storyline, as the player character draws conclusions about the story as it has unfolded to that point. The game also encourages using sensory commands ("touch", "listen to", "look at"), adding immersion to the experience. == Plot == Galatea is loosely based on the myth of Pygmalion, who carved the sculpture of a woman. In the myth, he falls in love with the statue, named Galatea or Elise in different versions, and the goddess Venus brings her to life. The story begins at the opening of an exhibition of artificial intelligences. The player, alone, discovers Galatea displayed on a pedestal with a small information placard. She is illuminated by a spotlight and wears an emerald dress. Seeing the player about to turn away, Galatea says, "They told me you were coming." From this point, the story may proceed in a number of ways depending on the player's words and actions. === Multilinear interactive fiction === Short describes this as "multilinear interactive fiction": while interactive fiction in general allows the player to find their own way through the story, this leads in most cases to a single ending (or at least a single desired 'correct' ending). With Galatea, Short presents a story with around 70 different endings and hundreds of possible ways of reaching them. The plot is thus designed to appear open-ended with the development of the story entirely dependent on what the player decides to talk or ask about or what actions they choose to perform. Thus the original author and the player share in the creation of a work of fiction. == Development == In interviews, Emily Short has explained that Galatea arose out of her efforts to develop advanced dialog coding for interactive fiction engines. Although code for simple conversational programs like ELIZA have existed since the 1960s, and limited dialog options have existed in interactive fiction since the 1970s, Short's efforts to develop chatterbot-like dialog required her to produce a simple test case scenario to test NPC interaction. Thus the single-room, single-occupant Galatea was a natural result. Development of the game progressed organically with Short engaging in test runs and drafting new dialog options for every conversational dead-end that arose. The game's multiple endings also arose in a similar fashion although Short had intended that there be multiple endings from the start. Although the nature of the game's development as well as its minimalist final form has led to questions regarding whether it is really a game and not just an experimental conversational program, Short has suggested that to her the definition of interactive fiction requires nothing more than a world model and a parser, and "anything you can cook up with those features counts as IF." Short has acknowledged the helpful influence of the close-knit IF community and the "atmosphere in which experimentation is valued" as leading to the success of her works like Galatea. == Reception == Galatea was well received, achieving critical acclaim from interactive fiction reviewers and literary scholars. The game is considered to aspire to a new level of art in interactive fiction, and thereby to have revolutionized the genre, establishing its author, Emily Short, as one of the key figures in the modern interactive fiction scene. Fellow award-winning IF author, Adam Cadre has called Galatea "the best NPC ever"—a view that was echoed by Joystiq's John Bardinelli. Cadre also describes the game as an example of an alternative kind of puzzle where "interactivity comes in deciding where to go, what to see, what to say. Rather than having to open gates along a path, you discover that they're all open at first, but stepping through one causes others to close." Galatea was described in 2007 by Indiegames.com as a "fascinating journey." In a 2009 article, Rock, Paper, Shotgun praised the depth and detail of the game, the complexities of the character design and its "masterful balance between intricacy and simplicity", and "Galatea's emotional turmoil" that is "encoded sweetly into the subtext of what's going on. By simply interacting in a logical manner, you learn more about this character than any cut-scene or info-dump could ever hope to convey." This was reiterated in a 2010 1UP.com article that listed Galatea as #2 in its "Top 5 Introductory Interactive Fiction Games" feature, describing it as intriguingly replayable, and as a "surprisingly rich game for its apparent minimalism". In 2011, PC Gamer highlighted Galatea as an example of the artistic and literary aspects of the interactive fiction genre. The titular character, Galatea, has been compared to the 2007 Portal character GLaDOS due to similarities in the personalities of the characters.
Read more →