AI Headshot Generator For Linkedin

AI Headshot Generator For Linkedin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Text-to-image personalization

    Text-to-image personalization

    Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data (usually a foundation model), is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects (such as the user's pet) or more abstract categories (new artistic style or object relations). Text-to-Image personalization methods typically bind the novel (personal) concept to new words in the vocabulary of the model. These words can then be used in future prompts to invoke the concept for subject-driven generation, inpainting, style transfer and even to correct biases in the model. To do so, models either optimize word-embeddings, fine-tune the generative model itself, or employ a mixture of both approaches. == Technology == Text-to-Image personalization was first proposed during August 2022 by two concurrent works, Textual Inversion and DreamBooth. In both cases, a user provides a few images (typically 3–5) of a concept, like their own dog, together with a coarse descriptor of the concept class (like the word "dog"). The model then learns to represent the subject through a reconstruction based objective, where prompts referring to the subject are expected to reconstruct images from the training set. In Textual Inversion, the personalized concepts are introduced into the text-to-image model by adding new words to the vocabulary of the model. Typical text-to-image models represent words (and sometimes parts-of-words) as tokens, or indices in a predefined dictionary. During generation, an input prompt is converted into such tokens, each of which is converted into a ‘word-embedding’: a continuous vector representation which is learned for each token as part of the model's training. Textual Inversion proposes to optimize a new word-embedding vector for representing the novel concept. This new embedding vector can then be assigned to a user-chosen string, and invoked whenever the user's prompt contains this string. In DreamBooth, rather than optimizing a new word vector, the full generative model itself is fine-tuned. The user first selects an existing token, typically one which rarely appears in prompts. The subject itself is then represented by a string containing this token, followed by a coarse descriptor of the subject's class. A prompt describing the subject will then take the form: "A photo of " (e.g. "a photo of sks cat" when learning to represent a specific cat). The text-to-image model is then tuned so that prompts of this form will generate images of the subject. == Textual Inversion == The key idea in Textual Inversion is to add a new term to the vocabulary of the diffusion model that corresponds to the new (personalized) concept. Textual Inversion operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model. These pseudo-words can be injected into new scenes using simple natural language descriptions, allowing for simple and intuitive modifications. The method allows a user to leverage multi-modal information — using a text-driven interface for ease of editing, but providing visual cues when approaching the limits of natural language. The resulting model is extremely light-weight per concept: only 1K long, but succeeds to encode detailed visual properties of the concept. == Extensions == Several approaches were proposed to refine and improve over the original methods. These include the following. Low-rank Adaptation (LoRA) - an adapter-based technique for efficient finetuning of models. In the case of text-to-image models, LoRA is typically used to modify the cross-attention layers of a diffusion model. Perfusion - a low rank update method that also locks the activations of the key matrix in the diffusion model's cross attention layers to the concept's coarse class. Extended Textual Inversion - a technique that learns an individual word embedding for each layer in the diffusion model's denoising network. Encoder-based methods that use another neural network to quickly personalize a model == Challenges and limitations == Text-to-image personalization methods must contend with several challenges. At their core is the goal of achieving high-fidelity to the personal concept while maintaining high alignment between novel prompts containing the subject, and the generated images (typically referred to as ‘editability’). Another challenge that personalization methods must contend with is memory requirements. Initial implementations of personalization methods required more than 20 Gigabytes of GPU memory, and more recent approaches have reported requirements of more than 40 Gigabytes. However, optimizations such as Flash Attention have since reduced this requirement considerably. Approaches that tune the entire generative model may also create checkpoints that are several gigabytes in size, making it difficult to share or store many models. Embedding based approaches require only a few kilobytes, but typically struggle to preserve identity while maintaining editability. More recent approaches have proposed hybrid tuning goals which optimize both an embedding and a subset of network weights. These can reduce storage requirements to as little as 100 Kilobytes while achieving quality comparable to full tuning methods. Finally, optimization processes can be lengthy, requiring several minutes of tuning for each novel concept. Encoder and quick-tuning methods aim to reduce this to seconds or less.

    Read more →
  • Sora (text-to-video model)

    Sora (text-to-video model)

    Sora was a text-to-video model and social media app developed by OpenAI. Using artificial intelligence, the model generated short video clips based on prompts, and could also extend existing short videos. In February 2024, OpenAI previewed examples of its output to the public, with the first generation of Sora released publicly for ChatGPT Plus and ChatGPT Pro users in the United States and Canada in December 2024. The second generation of Sora was released to select users in the US and Canada at the end of September 2025. Sora 2 integrated social media features into the app. The app was shut down on April 26, 2026 and the application programming interface (API) is planned to be discontinued on September 24, 2026, marking the end of the Sora AI brand as a whole. By default, the generator used copyrighted material in its videos, unless copyright holders actively opt out of having their content included. Videos contained a visible, moving digital watermark to prevent misuse, but a week after Sora 2's release, third-party programs became available which could remove the watermark. == Background == Several other models capable of generating video from text had been created prior to Sora, including Meta's Make‑A‑Video, Runway's Gen‑2 and Google Veo. OpenAI, the company behind Sora, had released DALL·E 3, the third of its DALL-E text-to-image models, in September 2023. == History == === Initial release === The team that developed Sora named it after the Japanese word for 'sky' to signify its "limitless creative potential". On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of high-definition videos that it had created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through Tokyo in the snow, and fake historical footage of the California gold rush. OpenAI stated that it was able to generate videos as long as one minute. The company then shared a technical report that highlighted the methods used to train the model. OpenAI CEO Sam Altman also posted a series of tweets responding to Twitter users' prompts with Sora-generated videos of the prompts. As of December 9, 2024, OpenAI had gradually made Sora available to the public for ChatGPT Pro and ChatGPT Plus users in the U.S. and Canada. Prior to this, the company had provided limited access to a small "red team", including experts in misinformation and bias, to perform adversarial testing on the model. The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields. In February 2025, OpenAI announced plans to integrate Sora into ChatGPT by letting users generate Sora videos from the chatbot. === Sora 2 === Sora 2 was unveiled on September 30, 2025, with an iOS app at the same time, as well as an Android app two months later. All videos generated by the model feature a visible, moving watermark to prevent misuse of the tool. The previous version of Sora also added a safety watermark to allow viewers to distinguish between real and fictional content. On October 7, 404 Media reported that third-party programs that could remove the watermark from Sora 2 videos had become prevalent. Many outlets, such as Wired magazine, have noted that the Sora 2 app is overtly similar to TikTok in style and features. === Discontinuation === On March 24, 2026, OpenAI announced on X that it was discontinuing Sora in both the mobile app and the API. The Sora app was shut down on April 26, 2026, while the API is planned to be shut down on September 24, 2026. OpenAI's partnership with Disney, which included a licensing agreement allowing Disney characters to be used within Sora, was also coming to an end. The decision prompted British technology news website The Register to label OpenAI a "product-killer", following in the footsteps of other technology companies such as Google, Amazon Web Services, Broadcom, Cloud Software Group, and Netscape. OpenAI did not provide a specific reason for discontinuing Sora in its shutdown notice. The reports that emerged regarding this discontinuity linked the decision to computation shortages, cost pressures, and a broader shift toward core enterprise products. Following its public launch, Sora's worldwide users peaked at around a million before declining to fewer than 500,000, while the service cost an estimated $1 million per day to operate due to the computational demands of video generation. == Legal regulation == In November 2024, an API key for Sora access was leaked by a group of testers on Hugging Face who posted a manifesto stating that they were protesting that Sora was used for "art washing". OpenAI revoked all access three hours after the leak was made public and stated that "hundreds of artists" have shaped the development and that "participation is voluntary". At the time of its launch, Sora 2 allowed copyrighted content by default unless copyright holders contacted OpenAI to restrict the generation of their content on the platform. On October 3, 2025, OpenAI stated that a future update to Sora 2 would give copyright holders "more granular control" over the generation of copyrighted content, but the company did not state whether existing content would be removed. On October 6, the chairman of the MPA criticized OpenAI's approach to copyright with Sora 2. On December 11, 2025, the Walt Disney Company announced that it would invest $1 billion in OpenAI to allow users to generate more than 200 of its copyrighted characters on Sora 2. These characters include those from Disney Animation, Pixar, Marvel Studios, and Star Wars. == Capabilities and limitations == The technology behind Sora is an adaptation of the technology behind DALL-E 3. According to OpenAI, Sora is a diffusion transformer, a denoising latent diffusion model with one transformer as its denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Recaptioning is employed to augment training data by using a video-to-text model to create detailed captions for videos. OpenAI trained the model using publicly available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos. Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its limited capacity to simulate complex physics, to understand causality and to differentiate left from right. OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for sexual, violent, hateful or celebrity imagery, as well as content featuring existing intellectual property. Sora researcher Tim Brooks stated that the model learned how to create 3D graphics from its dataset alone, while fellow Sora researcher Bill Peebles said that the model automatically created different video angles without being prompted. According to OpenAI, Sora-generated videos are also tagged with C2PA metadata to indicate that they are AI-processed. === Comparison with other models === The Artificial Analysis have placed Sora 2 pro lower than other text-to-video AI generators in the market on its leaderboard. Other models, such as Seedance 2.0 from ByteDance, Runaway 4.5 from Runaway, and Kling 3.0 from KlingAI, have ranked higher than Sora 2.0. == Reception == === Positive === In 2024, Will Douglas Heaven of the MIT Technology Review called the demonstration videos "impressive", but noted that they must have been cherry-picked and may not be representative of Sora's typical output. Lisa Lacy of CNET called its example videos "remarkably realistic – except perhaps when a human face appears close up or when sea creatures are swimming". In October 2025, The New York Times remarked that the release of the Sora 2 app in September 2025 was "jaw-dropping (for better and worse)" though also remarked that the app was a "social network in disguise" and "the type of product that companies like Meta and X have sought to build: a way to bring A.I. to the masses that people can share." The article expressed concern regarding the product's potential impact on society and its potential use to promote misinformation, disinformation, and scams. A 2025 study in Science Advances found that generative AI tools can lower barriers to entry in creative work. It enables users with diverse skill sets, including people with less formal artistic training and technical skills, to act on their creative and imaginative ideas. The lower barrier to entry allows such users previously locked out of the creative industry to produce content and easily act on their creative ideas. === Negative === Some internet users and online content creators, such as Hank Green, called the mobile app "SlopTok," a reference to both the mobile app TikTok and the term AI slop. Filmmaker Tyler Perry announced he would be putting a planned

    Read more →
  • TuVox

    TuVox

    TuVox is a company that produces VXML-based telephone speech-recognition applications to replace DTMF touch-tone systems for their clients. == History == TuVox was founded in 2001 by Steven S. Pollock and Ashok Khosla, formerly of Apple Computer Corporation and Claris Corporation. Since then, TuVox has grown to over 150 employees and has US offices in Cupertino, California and Boca Raton, Florida as well as international offices in London, Vancouver and Sydney. In 2005, TuVox acquired the customers and hosting facilities of Net-By-Tel. In 2007, the company raised $20m for its speech recognition, and phone menu software. On July 22, 2010, West Interactive — a subsidiary of West Corporation — announced its acquisition of TuVox. == Customers == TuVox clients include: 1-800-Flowers.com, AMC Entertainment, American Airlines, British Airways, M&T Bank, Canon Inc., Gateway, Inc., Motorola, Progress Energy Inc., Telecom New Zealand, Time, Inc., BECU, Virgin America and USAA.

    Read more →
  • Fuzzy logic

    Fuzzy logic

    Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. The term fuzzy logic was introduced with the 1965 proposal of fuzzy set theory by mathematician Lotfi Zadeh. Basic fuzzy logic had, however, been studied since the 1920s, as infinite-valued logic—notably by Łukasiewicz and Tarski. The works of Zadeh and Joseph Goguen in the 1960s and 1970s went further by considering issues such as linguistic variables and lattices. Fuzzy logic is based on the observation that people make decisions based on imprecise and non-numerical information. Fuzzy models or fuzzy sets are mathematical means of representing vagueness and imprecise information (hence the term fuzzy). These models have the capability of recognising, representing, manipulating, interpreting, and using data and information that are vague and lack certainty. Fuzzy logic has been applied to many fields, from control theory to artificial intelligence. == Overview == Classical logic only permits conclusions that are either true or false. However, there are also propositions with variable answers, which one might find when asking a group of people to identify a color. In such instances, the truth appears as the result of reasoning from inexact or partial knowledge in which the sampled answers are mapped on a spectrum. Both degrees of truth and probabilities range between 0 and 1 and hence may seem identical at first, but fuzzy logic uses degrees of truth as a mathematical model of vagueness, while probability is a mathematical model of ignorance. === Applying truth values === A basic application might characterize various sub-ranges of a continuous variable. For instance, a temperature measurement for anti-lock brakes might have several separate membership functions defining particular temperature ranges needed to control the brakes properly. Each function maps the same temperature value to a truth value in the 0 to 1 range. These truth values can then be used to determine how the brakes should be controlled. Fuzzy set theory provides a means for representing uncertainty. === Linguistic variables === In fuzzy logic applications, non-numeric values are often used to facilitate the expression of rules and facts. A linguistic variable such as age may accept values such as young and its antonym old. Because natural languages do not always contain enough value terms to express a fuzzy value scale, it is common practice to modify linguistic values with adjectives or adverbs. For example, we can use the hedges rather and somewhat to construct the additional values rather old or somewhat young. == Fuzzy systems == === Mamdani === The most well-known system is the Mamdani rule-based one. It uses the following rules: Fuzzify all input values into fuzzy membership functions. Execute all applicable rules in the rulebase to compute the fuzzy output functions. De-fuzzify the fuzzy output functions to get "crisp" output values. ==== Fuzzification ==== Fuzzification is the process of assigning the numerical input of a system to fuzzy sets with some degree of membership. This degree of membership may be anywhere within the interval [0,1]. If it is 0 then the value does not belong to the given fuzzy set, and if it is 1 then the value completely belongs within the fuzzy set. Any value between 0 and 1 represents the degree of uncertainty that the value belongs in the set. These fuzzy sets are typically described by words, and so by assigning the system input to fuzzy sets, we can reason with it in a linguistically natural manner. For example, in the image below, the meanings of the expressions cold, warm, and hot are represented by functions mapping a temperature scale. A point on that scale has three "truth values"—one for each of the three functions. The vertical line in the image represents a particular temperature that the three arrows (truth values) gauge. Since the red arrow points to zero, this temperature may be interpreted as "not hot"; i.e. this temperature has zero membership in the fuzzy set "hot". The orange arrow (pointing at 0.2) may describe it as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold". Therefore, this temperature has 0.2 membership in the fuzzy set "warm" and 0.8 membership in the fuzzy set "cold". The degree of membership assigned for each fuzzy set is the result of fuzzification. Fuzzy sets are often defined as triangle or trapezoid-shaped curves, as each value will have a slope where the value is increasing, a peak where the value is equal to 1 (which can have a length of 0 or greater) and a slope where the value is decreasing. They can also be defined using a sigmoid function. One common case is the standard logistic function defined as S ( x ) = 1 1 + e − x {\displaystyle S(x)={\frac {1}{1+e^{-x}}}} which has the following symmetry property S ( x ) + S ( − x ) = 1. {\displaystyle S(x)+S(-x)=1.} From this it follows that ( S ( x ) + S ( − x ) ) ⋅ ( S ( y ) + S ( − y ) ) ⋅ ( S ( z ) + S ( − z ) ) = 1 {\displaystyle (S(x)+S(-x))\cdot (S(y)+S(-y))\cdot (S(z)+S(-z))=1} ==== Fuzzy logic operators ==== Fuzzy logic works with membership values in a way that mimics Boolean logic. To this end, replacements for basic operators ("gates") AND, OR, NOT must be available. There are several ways to accomplish this. A common replacement is called the Zadeh operators: For TRUE/1 and FALSE/0, the fuzzy expressions produce the same result as the Boolean expressions. There are also other operators, more linguistic in nature, called hedges that can be applied. These are generally adverbs such as very, or somewhat, which modify the meaning of a set using a mathematical formula. However, an arbitrary choice table does not always define a fuzzy logic function. In the paper (Zaitsev, et al), a criterion has been formulated to recognize whether a given choice table defines a fuzzy logic function and a simple algorithm of fuzzy logic function synthesis has been proposed based on introduced concepts of constituents of minimum and maximum. A fuzzy logic function represents a disjunction of constituents of minimum, where a constituent of minimum is a conjunction of variables of the current area greater than or equal to the function value in this area (to the right of the function value in the inequality, including the function value). Another set of AND/OR operators is based on multiplication, where Given any two of AND/OR/NOT, it is possible to derive the third. The generalization of AND is an instance of a t-norm. ==== IF-THEN rules ==== IF-THEN rules map input or computed truth values to desired output truth values. Example: Given a certain temperature, the fuzzy variable hot has a certain truth value, which is copied to the high variable. Should an output variable occur in several THEN parts, the values from the respective IF parts are combined using the OR operator. ==== Defuzzification ==== The goal is to get a continuous variable from fuzzy truth values. This would be easy if the output truth values were exactly those obtained from fuzzification of a given number. Since, however, all output truth values are computed independently, in most cases they do not represent such a set of numbers. One has then to decide for a number that matches best the "intention" encoded in the truth value. For example, for several truth values of fan_speed, an actual speed must be found that best fits the computed truth values of the variables 'slow', 'moderate' and so on. There is no single algorithm for this purpose. A common algorithm is For each truth value, cut the membership function at this value Combine the resulting curves using the OR operator Find the center-of-weight of the area under the curve The x position of this center is then the final output. === Takagi–Sugeno–Kang (TSK) === The Takagi–Sugeno or Takagi–Sugeno–Kang (TSK) system was introduced by Tomohiro Takagi and Michio Sugeno for fuzzy identification of systems and applications to modeling and control. Sugeno and Kang later developed methods for structure identification of such fuzzy models from input-output data. The TSK system is similar to Mamdani, but the defuzzification process is included in the execution of the fuzzy rules. These are also adapted, so that instead the consequent of the rule is represented through a polynomial function, usually constant in a zero-order model or linear in a first-order model. An example of a rule with a constant output would be: In this case, the output will be equal to the constant of the consequent (e.g. 2). In most scenarios we would have an entire rule base, with 2 or more rules. If this is the case, the output of the entire rule base will be the average of the consequent of each rule i (Y

    Read more →
  • Hybrid intelligent system

    Hybrid intelligent system

    Hybrid intelligent system denotes a software system which employs, in parallel, a combination of methods and techniques from artificial intelligence subfields, such as: Neuro-symbolic systems Neuro-fuzzy systems Hybrid connectionist-symbolic models Fuzzy expert systems Connectionist expert systems Evolutionary neural networks Genetic fuzzy systems Rough fuzzy hybridization Reinforcement learning with fuzzy, neural, or evolutionary methods as well as symbolic reasoning methods. From the cognitive science perspective, every natural intelligent system is hybrid because it performs mental operations on both the symbolic and subsymbolic levels. For the past few years, there has been an increasing discussion of the importance of A.I. Systems Integration. Based on notions that there have already been created simple and specific AI systems (such as systems for computer vision, speech synthesis, etc., or software that employs some of the models mentioned above) and now is the time for integration to create broad AI systems. Proponents of this approach are researchers such as Marvin Minsky, Ron Sun, Aaron Sloman, Angelo Dalli and Michael A. Arbib. An example hybrid is a hierarchical control system in which the lowest, reactive layers are sub-symbolic. The higher layers, having relaxed time constraints, are capable of reasoning from an abstract world model and performing planning (even by hybrid wisdom). Intelligent systems usually rely on hybrid reasoning processes, which include induction, deduction, abduction and reasoning by analogy.

    Read more →
  • A.I.s

    A.I.s

    A.I.s is a themed anthology of science fiction short works edited by American writers Jack Dann and Gardner Dozois. It was first published in paperback by Ace Books in December 2004. It was reissued as an ebook by Baen Books in June 2013. The book collects ten novelettes and short stories by various science fiction authors, together with a preface by the editors. == Contents == "Preface" (Jack Dann and Gardner Dozois) "Antibodies" (Charles Stross) "Trojan Horse" (Michael Swanwick) "Birth Day" (Robert Reed) "The Hydrogen Wall" (Gregory Benford) "The Turing Test" (Chris Beckett) "Dante Dreams" (Stephen Baxter) "The Names of All the Spirits" (J. R. Dunn) "From the Corner of My Eye" (Alexander Glass) "Halfjack" (Roger Zelazny) "Computer Virus" (Nancy Kress)

    Read more →
  • Speech synthesis

    Speech synthesis

    Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written words on a home computer. The earliest computer operating system to have included a speech synthesizer was Unix in 1974, through the Unix speak utility. In 2000, Microsoft Sam was the default text-to-speech voice synthesizer used by the narrator accessibility feature, which shipped with all Windows 2000 operating systems, and subsequent Windows XP systems. A text-to-speech system (or "engine") is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech. == History == Long before the invention of electronic signal processing, some people tried to build machines to emulate human speech. There were also legends of the existence of "Brazen Heads", such as those involving Pope Silvester II (d. 1003 AD), Albertus Magnus (1198–1280), and Roger Bacon (1214–1294). In 1779, the German-Danish scientist Christian Gottlieb Kratzenstein won the first prize in a competition announced by the Russian Imperial Academy of Sciences and Arts for models he built of the human vocal tract that could produce the five long vowel sounds (in International Phonetic Alphabet notation: [aː], [eː], [iː], [oː] and [uː]). There followed the bellows-operated "acoustic-mechanical speech machine" of Wolfgang von Kempelen of Pressburg, Hungary, described in a 1791 paper. This machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. In 1837, Charles Wheatstone produced a "speaking machine" based on von Kempelen's design, and in 1846, Joseph Faber exhibited the "Euphonia". In 1923, Paget resurrected Wheatstone's design. In the 1930s, Bell Labs developed the vocoder, which automatically analyzed speech into its fundamental tones and resonances. From his work on the vocoder, Homer Dudley developed a keyboard-operated voice-synthesizer called The Voder (Voice Demonstrator), which he exhibited at the 1939 New York World's Fair. Franklin S. Cooper and his colleagues at Haskins Laboratories built the pattern playback in the late 1940s and completed it in 1950. There were several different versions of this hardware device; only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, Alvin Liberman and colleagues discovered acoustic cues for the perception of phonetic segments (consonants and vowels). === Electronic devices === The first computer-based speech-synthesis systems originated in the late 1950s. Noriko Umeda et al. developed the first general English text-to-speech system in 1968, at the Electrotechnical Laboratory in Japan. In 1961, physicist John Larry Kelly, Jr and his colleague Louis Gerstman used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs. Kelly's voice recorder synthesizer (vocoder) recreated the song "Daisy Bell", with musical accompaniment from Max Mathews. Coincidentally, Arthur C. Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel 2001: A Space Odyssey, where the HAL 9000 computer sings the same song as astronaut Dave Bowman puts it to sleep. Despite the success of purely electronic speech synthesis, research into mechanical speech-synthesizers continues. Linear predictive coding (LPC), a form of speech coding, began development with the work of Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone (NTT) in 1966. Further developments in LPC technology were made by Bishnu S. Atal and Manfred R. Schroeder at Bell Labs during the 1970s. LPC was later the basis for early speech synthesizer chips, such as the Texas Instruments LPC Speech Chips used in the Speak & Spell toys from 1978. In 1975, Fumitada Itakura developed the line spectral pairs (LSP) method for high-compression speech coding, while at NTT. From 1975 to 1981, Itakura studied problems in speech analysis and synthesis based on the LSP method. In 1980, his team developed an LSP-based speech synthesizer chip. LSP is an important technology for speech synthesis and coding, and in the 1990s was adopted by almost all international speech coding standards as an essential component, contributing to the enhancement of digital speech communication over mobile channels and the internet. In 1975, MUSA was released, and was one of the first Speech Synthesis systems. It consisted of a stand-alone computer hardware and a specialized software that enabled it to read Italian. A second version, released in 1978, was also able to sing Italian in an "a cappella" style. Dominant systems in the 1980s and 1990s were the DECtalk system, based largely on the work of Dennis Klatt at MIT, and the Bell Labs system; the latter was one of the first multilingual language-independent systems, making extensive use of natural language processing methods. Handheld electronics featuring speech synthesis began emerging in the 1970s. One of the first was the Telesensory Systems Inc. (TSI) Speech+ portable calculator for the blind in 1976. Other devices had primarily educational purposes, such as the Speak & Spell toy produced by Texas Instruments in 1978. Fidelity released a speaking version of its electronic chess computer in 1979. The first video game to feature speech synthesis was the 1980 shoot 'em up arcade game, Stratovox (known in Japan as Speak & Rescue), from Sun Electronics. The first personal computer game with speech synthesis was Manbiki Shoujo (Shoplifting Girl), released in 1980 for the PET 2001, for which the game's developer, Hiroshi Suzuki, developed a "zero cross" programming technique to produce a synthesized speech waveform. Another early example, the arcade version of Berzerk, also dates from 1980. The Milton Bradley Company produced the first multi-player electronic game using voice synthesis, Milton, in the same year. In 1976, Computalker Consultants released their CT-1 Speech Synthesizer. Designed by D. Lloyd Rice and Jim Cooper, it was an analog synthesizer built to work with microcomputers using the S-100 bus standard. Synthesized voices typically sounded male until 1990, when Ann Syrdal, at AT&T Bell Laboratories, created a female voice. Ray Kurzweil predicted in 2005 that as the cost-performance ratio caused speech synthesizers to become cheaper and more accessible, more people would benefit from the use of text-to-speech programs. === Artificial intelligence === In September 2016, DeepMind released WaveNet, which demonstrated that deep learning models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms, starting the field of deep learning speech synthesis. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its

    Read more →
  • Federal Virtual World Challenge

    Federal Virtual World Challenge

    The Federal Virtual Challenge, formerly The Federal Virtual Worlds Challenge is a competition led by the Simulation and Training Technology Center (United States Army Research, Development and Engineering Command). The event is conducted in order to reach a global development community that will create innovative and interactive training and analysis services in virtual worlds. The inaugural event began in 2009 with the awards being conducted during March 2010 GameTech conference in Orlando, Florida. == Description == The focus of the challenge is training or analysis capability conducted wholly in a virtual environment. The training and analysis audience includes all United States Federal Agencies including, Department of Defense, Department of Homeland Security, Department of Transportation, and Department of Health and Human Services, NASA, DOT, and many more.

    Read more →
  • Shape context

    Shape context

    Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000. == Theory == The shape context is intended to be a way of describing shapes that allows for measuring shape similarity and the recovering of point correspondences. The basic idea is to pick n points on the contours of a shape. For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all other points. The set of all these vectors is a rich description of the shape localized at that point but is far too detailed. The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor. So, for the point pi, the coarse histogram of the relative coordinates of the remaining n − 1 points, h i ( k ) = # { q ≠ p i : ( q − p i ) ∈ bin ( k ) } {\displaystyle h_{i}(k)=\#\{q\neq p_{i}:(q-p_{i})\in {\mbox{bin}}(k)\}} is defined to be the shape context of p i {\displaystyle p_{i}} . The bins are normally taken to be uniform in log-polar space. The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter "A" are shown. (a) and (b) are the sampled edge points of the two shapes. (c) is the diagram of the log-polar bins used to compute the shape context. (d) is the shape context for the point marked with a circle in (a), (e) is that for the point marked as a diamond in (b), and (f) is that for the triangle. As can be seen, since (d) and (e) are the shape contexts for two closely related points, they are quite similar, while the shape context in (f) is very different. For a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scaling, small perturbations, and, depending on the application, rotation. Translational invariance comes naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance α {\displaystyle \alpha } between all the point pairs in the shape although the median distance can also be used. Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers using synthetic point set matching experiments. One can provide complete rotational invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotational invariance e.g. distinguishing a "6" from a "9". == Use in shape matching == A complete system that uses shape contexts for shape matching consists of the following steps (which will be covered in more detail in the Details of Implementation section): Randomly select a set of points that lie on the edges of a known shape and another set of points on an unknown shape. Compute the shape context of each point found in step 1. Match each point from the known shape to a point on an unknown shape. To minimize the cost of matching, first choose a transformation (e.g. affine, thin plate spline, etc.) that warps the edges of the known shape to the unknown (essentially aligning the two shapes). Then select the point on the unknown shape that most closely corresponds to each warped point on the known shape. Calculate the "shape distance" between each pair of points on the two shapes. Use a weighted sum of the shape context distance, the image appearance distance, and the bending energy (a measure of how much transformation is required to bring the two shapes into alignment). To identify the unknown shape, use a nearest-neighbor classifier to compare its shape distance to shape distances of known objects. == Details of implementation == === Step 1: Finding a list of points on shape edges === The approach assumes that the shape of an object is essentially captured by a finite subset of the points on the internal or external contours on the object. These can be simply obtained using the Canny edge detector and picking a random set of points from the edges. Note that these points need not and in general do not correspond to key-points such as maxima of curvature or inflection points. It is preferable to sample the shape with roughly uniform spacing, though it is not critical. === Step 2: Computing the shape context === This step is described in detail in the Theory section. === Step 3: Computing the cost matrix === Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k). As shape contexts are distributions represented as histograms, it is natural to use the χ2 test statistic as the "shape context cost" of matching the two points: C S = 1 2 ∑ k = 1 K [ g ( k ) − h ( k ) ] 2 g ( k ) + h ( k ) {\displaystyle C_{S}={\frac {1}{2}}\sum _{k=1}^{K}{\frac {[g(k)-h(k)]^{2}}{g(k)+h(k)}}} The values of this range from 0 to 1. In addition to the shape context cost, an extra cost based on the appearance can be added. For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition): C A = 1 2 ‖ ( cos ⁡ ( θ 1 ) sin ⁡ ( θ 1 ) ) − ( cos ⁡ ( θ 2 ) sin ⁡ ( θ 2 ) ) ‖ {\displaystyle C_{A}={\frac {1}{2}}{\begin{Vmatrix}{\dbinom {\cos(\theta _{1})}{\sin(\theta _{1})}}-{\dbinom {\cos(\theta _{2})}{\sin(\theta _{2})}}\end{Vmatrix}}} This is half the length of the chord in unit circle between the unit vectors with angles θ 1 {\displaystyle \theta _{1}} and θ 2 {\displaystyle \theta _{2}} . Its values also range from 0 to 1. Now the total cost of matching the two points could be a weighted-sum of the two costs: C = ( 1 − β ) C S + β C A {\displaystyle C=(1-\beta )C_{S}+\beta C_{A}\!\,} Now for each point pi on the first shape and a point qj on the second shape, calculate the cost as described and call it Ci,j. This is the cost matrix. === Step 4: Finding the matching that minimizes total cost === Now, a one-to-one matching π ( i ) {\displaystyle \pi (i)} that matches each point pi on shape 1 and qj on shape 2 that minimizes the total cost of matching, H ( π ) = ∑ i C ( p i , q π ( i ) ) {\displaystyle H(\pi )=\sum _{i}C\left(p_{i},q_{\pi (i)}\right)} is needed. This can be done in O ( N 3 ) {\displaystyle O(N^{3})} time using the Hungarian method, although there are more efficient algorithms. To have robust handling of outliers, one can add "dummy" nodes that have a constant but reasonably large cost of matching to the cost matrix. This would cause the matching algorithm to match outliers to a "dummy" if there is no real match. === Step 5: Modeling transformation === Given the set of correspondences between a finite set of points on the two shapes, a transformation T : R 2 → R 2 {\displaystyle T:\mathbb {R} ^{2}\to \mathbb {R} ^{2}} can be estimated to map any point from one shape to the other. There are several choices for this transformation, described below. ==== Affine ==== The affine model is a standard choice: T ( p ) = A p + o {\displaystyle T(p)=Ap+o\!} . The least squares solution for the matrix A {\displaystyle A} and the translational offset vector o is obtained by: o = 1 n ∑ i = 1 n ( p i − q π ( i ) ) , A = ( Q + P ) t {\displaystyle o={\frac {1}{n}}\sum _{i=1}^{n}\left(p_{i}-q_{\pi (i)}\right),A=(Q^{+}P)^{t}} Where P = ( 1 p 11 p 12 ⋮ ⋮ ⋮ 1 p n 1 p n 2 ) {\displaystyle P={\begin{pmatrix}1&p_{11}&p_{12}\\\vdots &\vdots &\vdots \\1&p_{n1}&p_{n2}\end{pmatrix}}} with a similar expression for Q {\displaystyle Q\!} . Q + {\displaystyle Q^{+}\!} is the pseudoinverse of Q {\displaystyle Q\!} . ==== Thin plate spline ==== The thin plate spline (TPS) model is the most widely used model for transformations when working with shape contexts. A 2D transformation can be separated into two TPS function to model a coordinate transform: T ( x , y ) = ( f x ( x , y ) , f y ( x , y ) ) {\displaystyle T(x,y)=\left(f_{x}(x,y),f_{y}(x,y)\right)} where each of the ƒx and ƒy have the form: f ( x , y ) = a 1 + a x x + a y y + ∑ i = 1 n ω i U ( ‖ ( x i , y i ) − ( x , y ) ‖ ) , {\displaystyle f(x,y)=a_{1}+a_{x}x+a_{y}y+\sum _{i=1}^{n}\omega _{i}U\left({\begin{Vmatrix}(x_{i},y_{i})-(x,y)\end{Vmatrix}}\right),} and the kernel function U ( r ) {\displaystyle U(r)\!} is defined by U ( r ) = r 2 log ⁡ r 2 {\displaystyle U(r)=r^{2}\log r^{2}\!} . The exact details of how to solve for the parameters can be found elsewhere but it essentially involves solving a linear system of equations. The bending energy (a measure of how much transformation is needed to align the points) will also be easily obtained. ==== Regularized TPS ==== The TPS formulation above has exact matching requirement for the pairs of points on the two shapes. For noisy data, it is best to

    Read more →
  • Evolutionary acquisition of neural topologies

    Evolutionary acquisition of neural topologies

    Evolutionary acquisition of neural topologies (EANT/EANT2) is an evolutionary reinforcement learning method that evolves both the topology and weights of artificial neural networks. It is closely related to the works of Angeline et al. and Stanley and Miikkulainen. Like the work of Angeline et al., the method uses a type of parametric mutation that comes from evolution strategies and evolutionary programming (now using the most advanced form of the evolution strategies CMA-ES in EANT2), in which adaptive step sizes are used for optimizing the weights of the neural networks. Similar to the work of Stanley (NEAT), the method starts with minimal structures which gain complexity along the evolution path. == Contribution of EANT to neuroevolution == Despite sharing these two properties, the method has the following important features which distinguish it from previous works in neuroevolution. It introduces a genetic encoding called common genetic encoding (CGE) that handles both direct and indirect encoding of neural networks within the same theoretical framework. The encoding has important properties that makes it suitable for evolving neural networks: It is complete in that it is able to represent all types of valid phenotype networks. It is closed, i.e. every valid genotype represents a valid phenotype. (Similarly, the encoding is closed under genetic operators such as structural mutation and crossover.) These properties have been formally proven. For evolving the structure and weights of neural networks, an evolutionary process is used, where the exploration of structures is executed at a larger timescale (structural exploration), and the exploitation of existing structures is done at a smaller timescale (structural exploitation). In the structural exploration phase, new neural structures are developed by gradually adding new structures to an initially minimal network that is used as a starting point. In the structural exploitation phase, the weights of the currently available structures are optimized using an evolution strategy. == Performance == EANT has been tested on some benchmark problems such as the double-pole balancing problem, and the RoboCup keepaway benchmark. In all the tests, EANT was found to perform very well. Moreover, a newer version of EANT, called EANT2, was tested on a visual servoing task and found to outperform NEAT and the traditional iterative Gauss–Newton method. Further experiments include results on a classification problem.

    Read more →
  • Paranoia (role-playing game)

    Paranoia (role-playing game)

    Paranoia is a dystopian science-fiction tabletop role-playing game originally designed and written by Greg Costikyan, Dan Gelber, and Eric Goldberg, and first published in 1984 by West End Games. Since 2004 the game has been published under license by Mongoose Publishing. The game won the Origins Award for Best Roleplaying Rules of 1984 and was inducted into the Origins Awards Hall of Fame in 2007. Paranoia is notable among tabletop games for being more competitive than co-operative, with players encouraged to betray one another for their own interests, as well as for keeping a light-hearted, tongue in cheek tone despite its dystopian setting. Several editions of the game have been published since the original version, and the franchise has spawned several spin-offs, novels and comic books based on the game. == Premise == The game is set in a dystopian future city controlled by the Computer (also known as "Friend Computer"), and where information (including the game rules) are restricted by color-coded "security clearance". Player characters are initially enforcers of the Computer's authority known as Troubleshooters, and are given missions to seek out and eliminate threats to the Computer's control. They are also part of prohibited underground movements, and have secret objectives including theft from and murder of other player characters. == Tone == Paranoia is a humorous role-playing game set in a dystopian future along the lines of Nineteen Eighty-Four, Brave New World, Logan's Run, and THX 1138; however, the tone of the game is rife with black humor, frequently tongue-in-cheek rather than dark and heavy. Most of the game's humor is derived from the players' (usually futile) attempts to complete their assignment while simultaneously adhering to the Computer's arbitrary, contradictory and often nonsensical security directives. The Paranoia rulebook is unusual in a number of ways; demonstrating any knowledge of the rules is forbidden, and most of the rulebook is written in an easy, conversational tone that often makes fun of the players and their characters, while occasionally taking digs at other notable role-playing games. === Setting === The game's main setting is an immense, futuristic city called Alpha Complex. Alpha Complex is controlled by the Computer, a civil service AI construct (a literal realization of the "Influencing Machine" that some schizophrenics fear). The Computer serves as the game's principal antagonist, and fears a number of threats to its 'perfect' society, such as the Outdoors, mutants, and secret societies (especially Communists). To deal with these threats, the Computer employs Troubleshooters, whose job is to go out, find trouble, and shoot it. Player characters are usually Troubleshooters, although later game supplements have allowed the players to take on other roles, such as High-Programmers of Alpha Complex. The player characters frequently receive mission instructions from the Computer that are incomprehensible, self-contradictory, or obviously fatal if adhered to, and side-missions (such as Mandatory Bonus Duties) that conflict with the main mission. Failing a mission generally results in termination of the player character, but succeeding can just as often result in the same fate, after being rewarded for successfully concluding the mission. They are issued equipment that is uniformly dangerous, faulty, or "experimental" (i.e., almost certainly dangerous and faulty). Additionally, each player character is generally an unregistered mutant and a secret society member (which are both termination offenses in Alpha Complex), and has a hidden agenda separate from the group's goals, often involving stealing from or killing teammates. Thus, missions often turn into a comedy of errors, as everyone on the team seeks to double-cross everyone else while keeping their own secrets. The game's manual encourages suspicion between players, offering several tips on how to make the gameplay as paranoid as possible. Every player's character is assigned six clones, known as a six-pack, which are used to replace the preceding clone upon his or her death. The game lacks a conventional health system; most wounds the player characters can suffer are assumed to be fatal. As a result, Paranoia allows characters to be routinely killed, yet the player can continue instead of leaving the game. This easy spending of clones tends to lead to frequent firefights, gruesome slapstick, and the horrible yet humorous demise of most if not all of the player character's clone family. Additional clones can be purchased if one gains sufficient favour with the Computer. === Security clearances === Paranoia features a security clearance system based on colors of the visible spectrum which heavily restricts what the players can and cannot legally do; everything from corridors to food and equipment have security restrictions. The lowest rating is Infrared, but the lowest playable security clearance is Red; the game usually begins with the characters having just been promoted to Red grade. Interfering with anything which is above that player's clearance carries significant risk. The full order of clearances from lowest to highest is Infrared (visually represented by black), Red, Orange, Yellow, Green, Blue, Indigo, Violet, and Ultraviolet (visually represented by white). Within the game, Infrared-clearance citizens live dull lives of mindless drudgery and are heavily medicated, while higher clearance characters may be allowed to demote or even summarily execute those of a lower rank and those with Ultraviolet clearance are almost completely unrestricted and have a great deal of access to the Computer; they are the only citizens that may (legally) access and modify the Computer's programming, and thus Ultraviolet citizens are also referred to as "High Programmers". Security clearance is not related to competence but is instead the result of the Computer's often insane and unjustified calculus of trust concerning a citizen. It is suggested that it may in fact be the High Programmers' meddling with The Computer's programming that resulted in its insanity. === Secret societies === In the game, secret societies tend to be based on sketchy and spurious knowledge of historical matters. For example, previous editions included societies such as the "Seal Club" that idolizes the Outdoors but is unsure what plants and animals actually look like. Other societies include the Knights of the Circular Object (based on the Knights of the Round Table), the Trekkies, and the First Church of Christ Computer Programmer. In keeping with the theme of paranoia, many secret societies have spies or double agents in each other's organizations. The first edition also included secret societies such as Programs Groups (the personal agents and spies of the High Programmers at the apex of Alpha Complex society) and Spy For Another Alpha Complex. The actual societies which would be encountered in a game depends on the play style; some societies are more suited for more light-hearted games (Zap-style, or the lighter end of Classic), whereas others represent a more serious threat to Alpha Complex and are therefore more suitable for Straight or the more dark sort of Classic games. == Publication history == Six editions have been published. Three of these were published by West End Games — the first, second, and fifth editions — whereas the later three editions (Paranoia XP, the 25th Anniversary edition and the "Red Clearance" edition) were published by Mongoose Publishing. In addition to these six published editions, it is known that West End Games were working on a third edition — to replace the poorly received fifth edition — in the late 1990s, but their financial issues would prevent this edition from being published, except for being included in one tournament adventure. === First edition === The first edition, was written by Greg Costikyan, Dan Gelber, and Eric Goldberg, and published in 1984 by West End Games. In 1985, this edition of Paranoia won the Origins Award for Best Roleplaying Rules of 1984. This edition, while encouraging dark humour in-game, took a fairly serious dystopian tone; the supplements and adventures released to accompany it emphasised the lighter side, however, establishing the freewheeling mix of slapstick, intra-team backstabbing and satire that is classically associated with a game of Paranoia. === Second edition === The second edition, is credited to Costikyan, Gelber, Goldberg, Ken Rolston, and Paul Murphy, was published in 1987 by West End Games. This edition can be seen as a response to the natural development of the line towards a rules-light, fast and entertaining play style. Here, the humorous possibilities of life in a paranoid dystopia are emphasised, and the rules are simplified. ==== Metaplot and the second edition ==== Many of the supplements released for the second edition fall into a story arc set up by new writers and line editors

    Read more →
  • Fuzzy control system

    Fuzzy control system

    A fuzzy control system is a control system based on fuzzy logic – a mathematical system that analyzes analog input values in terms of logical variables that take on continuous values between 0 and 1, in contrast to classical or digital logic, which operates on discrete values of either 1 or 0 (true or false, respectively). Fuzzy logic is widely used in machine control. The term "fuzzy" refers to the fact that the logic involved can deal with concepts that cannot be expressed as the "true" or "false" but rather as "partially true". Although alternative approaches such as genetic algorithms and neural networks can perform just as well as fuzzy logic in many cases, fuzzy logic has the advantage that the solution to the problem can be cast in terms that human operators can understand, such that that their experience can be used in the design of the controller. This makes it easier to mechanize tasks that are already successfully performed by humans. == History and applications == Fuzzy logic was proposed by Lotfi A. Zadeh of the University of California at Berkeley in a 1965 paper. He elaborated on his ideas in a 1973 paper that introduced the concept of "linguistic variables", which in this article equates to a variable defined as a fuzzy set. Other research followed, with the first industrial application, a cement kiln built in Denmark, coming on line in 1976. Fuzzy systems were initially implemented in Japan. Interest in fuzzy systems was sparked by Seiji Yasunobu and Soji Miyamoto of Hitachi, who in 1985 provided simulations that demonstrated the feasibility of fuzzy control systems for the Sendai Subway. Their ideas were adopted, and fuzzy systems were used to control accelerating, braking, and stopping when the Namboku Line opened in 1987. In 1987, Takeshi Yamakawa demonstrated the use of fuzzy control, through a set of simple dedicated fuzzy logic chips, in an "inverted pendulum" experiment. This is a classic control problem, in which a vehicle tries to keep a pole mounted on its top by a hinge upright by moving back and forth. Yamakawa subsequently made the demonstration more sophisticated by mounting a wine glass containing water and even a live mouse to the top of the pendulum: the system maintained stability in both cases. Yamakawa eventually went on to organize his own fuzzy-systems research lab to help exploit his patents in the field. Japanese engineers subsequently developed a wide range of fuzzy systems for both industrial and consumer applications. In 1988 Japan established the Laboratory for International Fuzzy Engineering (LIFE), a cooperative arrangement between 48 companies to pursue fuzzy research. The automotive company Volkswagen was the only foreign corporate member of LIFE, dispatching a researcher for a duration of three years. Japanese consumer goods often incorporate fuzzy systems. Matsushita vacuum cleaners use microcontrollers running fuzzy algorithms to interrogate dust sensors and adjust suction power accordingly. Hitachi washing machines use fuzzy controllers to load-weight, fabric-mix, and dirt sensors and automatically set the wash cycle for the best use of power, water, and detergent. Canon developed an autofocusing camera that uses a charge-coupled device (CCD) to measure the clarity of the image in six regions of its field of view and use the information provided to determine if the image is in focus. It also tracks the rate of change of lens movement during focusing, and controls its speed to prevent overshoot. The camera's fuzzy control system uses 12 inputs: 6 to obtain the current clarity data provided by the CCD and 6 to measure the rate of change of lens movement. The output is the position of the lens. The fuzzy control system uses 13 rules and requires 1.1 kilobytes of memory. An industrial air conditioner designed by Mitsubishi uses 25 heating rules and 25 cooling rules. A temperature sensor provides input, with control outputs fed to an inverter, a compressor valve, and a fan motor. Compared to the previous design, the fuzzy controller heats and cools five times faster, reduces power consumption by 24%, increases temperature stability by a factor of two, and uses fewer sensors. Other applications investigated or implemented include: character and handwriting recognition; optical fuzzy systems; robots, including one for making Japanese flower arrangements; voice-controlled robot helicopters (hovering is a "balancing act" rather similar to the inverted pendulum problem); rehabilitation robotics to provide patient-specific solutions (e.g. to control heart rate and blood pressure ); control of flow of powders in film manufacture; elevator systems; and so on. Work on fuzzy systems is also proceeding in North America and Europe, although on a less extensive scale than in Japan. The US Environmental Protection Agency has investigated fuzzy control for energy-efficient motors, and NASA has studied fuzzy control for automated space docking: simulations show that a fuzzy control system can greatly reduce fuel consumption. Firms such as Boeing, General Motors, Allen-Bradley, Chrysler, Eaton, and Whirlpool have worked on fuzzy logic for use in low-power refrigerators, improved automotive transmissions, and energy-efficient electric motors. In 1995 Maytag introduced an "intelligent" dishwasher based on a fuzzy controller and a "one-stop sensing module" that combines a thermistor, for temperature measurement; a conductivity sensor, to measure detergent level from the ions present in the wash; a turbidity sensor that measures scattered and transmitted light to measure the soiling of the wash; and a magnetostrictive sensor to read spin rate. The system determines the optimum wash cycle for any load to obtain the best results with the least amount of energy, detergent, and water. It even adjusts for dried-on foods by tracking the last time the door was opened, and estimates the number of dishes by the number of times the door was opened. Xiera Technologies Inc. has developed the first auto-tuner for the fuzzy logic controller's knowledge base known as edeX. This technology was tested by Mohawk College and was able to solve non-linear 2x2 and 3x3 multi-input multi-output problems. Research and development is also continuing on fuzzy applications in software, as opposed to firmware, design, including fuzzy expert systems and integration of fuzzy logic with neural-network and so-called adaptive "genetic" software systems, with the ultimate goal of building "self-learning" fuzzy-control systems. These systems can be employed to control complex, nonlinear dynamic plants, for example, human body. == Fuzzy sets == The input variables in a fuzzy control system are in general mapped by sets of membership functions similar to this, known as "fuzzy sets". The process of converting a crisp input value to a fuzzy value is called "fuzzification". The fuzzy logic based approach had been considered by designing two fuzzy systems, one for error heading angle and the other for velocity control. A control system may also have various types of switch, or "ON-OFF", inputs along with its analog inputs, and such switch inputs of course will always have a truth value equal to either 1 or 0, but the scheme can deal with them as simplified fuzzy functions that happen to be either one value or another. Given "mappings" of input variables into membership functions and truth values, the microcontroller then makes decisions for what action to take, based on a set of "rules", each of the form: IF brake temperature IS warm AND speed IS not very fast THEN brake pressure IS slightly decreased. In this example, the two input variables are "brake temperature" and "speed" that have values defined as fuzzy sets. The output variable, "brake pressure" is also defined by a fuzzy set that can have values like "static" or "slightly increased" or "slightly decreased" etc. === Fuzzy control in detail === Fuzzy controllers are very simple conceptually. They consist of an input stage, a processing stage, and an output stage. The input stage maps sensor or other inputs, such as switches, thumbwheels, and so on, to the appropriate membership functions and truth values. The processing stage invokes each appropriate rule and generates a result for each, then combines the results of the rules. Finally, the output stage converts the combined result back into a specific control output value. The most common shape of membership functions is triangular, although trapezoidal and bell curves are also used, but the shape is generally less important than the number of curves and their placement. From three to seven curves are generally appropriate to cover the required range of an input value, or the "universe of discourse" in fuzzy jargon. As discussed earlier, the processing stage is based on a collection of logic rules in the form of IF-THEN statements, where the IF part is called the "antecedent" and the THEN part is called the "consequent". Typical fuzzy

    Read more →
  • Softwarp

    Softwarp

    Softwarp is a software technique to warp an image so that it can be projected on a curved screen. This can be done in real time by inserting the softwarp as a last step in the rendering cycle. The problem is to know how the image should be warped to look correct on the curved screen. There are several techniques to auto calibrate the warping by projecting a pattern and using cameras and/or sensors. The information from the sensors is sent to the software so that it can analyze the data and calculate the curvature of the projection screen. == Usage == The softwarp can be used to project virtual views on curved walls and domes. These are usually used in vehicle simulators, for instance boat-, car- and airplane simulators. To make it possible to cover a dome with a 360 degree view you need to use several projectors. A problem with using several projectors on the same screen is that the edges between the projected images get about twice the amount of light. This is solved by using a technique called edge blending. With this technique a “filter” is inserted on the edge that fades the image from 100% light strength (luminance) to 0% (the lowest luminance depends on the contrast ratio of the projector). == History == The first warping technologies used a hardware image processing unit to warp the image. This processing unit was inserted between the graphics card and the projector. The problem with this technique is that it depends on the type of signal and the quality of the signal from the graphics card to warp it correctly. The process unit also needs several lines of image information before it can start sending out the warped image. This adds a latency to the display system that could be a problem in simulators that need fast response time, for instance fighter jet simulators. Softwarping eliminates the latency.

    Read more →
  • Land of Memories

    Land of Memories

    Land of Memories (Chinese: 机忆之地) is a Chinese science-fiction novel by Shen Yang (沈阳), a professor at Tsinghua University's School of Journalism and Communication. The story revolves around a former neuroscientist trying to recover her memories from the metaverse after suffering amnesia due to an accident. It contains almost 6,000 Chinese characters and was shortened from an AI-generated draft that was 43,000 characters long. The process involved 66 prompts spanning almost three hours. The novel was among 18 submissions that won the level-two prize at the Fifth Jiangsu Youth Science Education and Science Fiction Competition (第五届江苏省青年科普科幻作品大赛). The contest was restricted to participants between the age of 14 and 45 but did not forbid entries generated by AI. One of its organizers reached out to Shen after finding out that the professor had been experimenting with writing science fiction using AI. The judges were not told about the novel's origin in advance. Three of them, out of the six, approved the work. One judge, who had worked with AI models before, recognized that the novel was written by AI and criticized the work for lacking emotional appeal. The organizer who had contacted Shen said the novel's introduction was not bad but the story did not develop well. It would not meet the usual standards for publication. However, he still plans to allow AI-generated submissions in 2024. Fu Ruchu, editorial department director of the People's Literature Publishing House, said the novel was not easily identifiable as AI-generated and applauded its logical consistency. She warned that artificial intelligence could endanger the jobs of fiction writers and cause permanent damage to literary language.

    Read more →
  • Google AI Studio

    Google AI Studio

    Google AI Studio is a web-based integrated development environment developed by Google for prototyping applications using generative AI models. Released in December 2023 alongside the Gemini API, the platform provides access to Google's Gemini family of models and related tools for image, video, and audio generation. The service targets both developers and non-technical users for testing prompts and generating code for the Gemini API. == History == Google launched AI Studio on December 13, 2023, as the successor to Google MakerSuite. MakerSuite, introduced at Google I/O in May 2023, had provided similar functionality for Google's PaLM language models. The AI Studio was launched alongside the public release of the Gemini API. == Features == AI Studio's interface consists of a central prompt area and a settings panel for model selection and parameter adjustment. The platform supports chat prompts for multi-turn conversations and includes system instructions for defining model behavior, tone, or specific rules. Users can employ zero-shot and few-shot prompting techniques to guide the model's output format. The platform processes various media types including video, audio, and documents, and can generate images through Imagen models, videos through Veo models, and audio through text-to-speech functionality. Additional tools include real-time streaming for screen sharing and live analysis, code execution in a sandboxed Python environment, grounding with Google Search for current information, URL context for analyzing specific web pages, and a thinking mode for complex reasoning tasks. == Available models == The platform provides access to several Google AI models including the Gemini language models, Imagen for image generation, Veo for video generation, LearnLM for educational applications, and Gemma, Google's open-source model family. == Privacy and data usage == Google AI Studio's data handling differs between free and paid users. For free tier users, Google uses submitted prompts, uploaded files, and generated responses to improve its products and services, with human reviewers potentially reading and annotating the data after disconnection from user accounts. Google advises against submitting sensitive information on the free tier. Users who enable Google Cloud Billing are considered paid service users, and their data is not used for product improvement. Data is processed according to Google's Data Processing Addendum and retained temporarily for abuse monitoring. == Availability == The platform is available at no cost, with API usage subject to a free tier with daily and per-minute rate limits. Access is restricted to users aged 18 and older in specific countries and territories. The service was initially unavailable in the United Kingdom and European Economic Area due to regulatory concerns, which drew user complaints. == Reception == Reviews have noted the platform's accessibility and integration with Gemini models, with features such as real-time screen sharing and large context windows cited as notable capabilities. However, reviewers have raised concerns about the privacy implications for free tier users, whose data is used for model training. Some users have reported inconsistent performance with features like screen streaming and issues with folder uploads for large datasets. The initial geographic restrictions were a point of criticism among developers in affected regions.

    Read more →