AI Companion Zoom Logo

AI Companion Zoom Logo — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • History of natural language processing

    History of natural language processing

    The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence. == Early history == The history of machine translation dates back to the seventeenth century, when philosophers such as Leibniz and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine. The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni, was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian, was more detailed. Troyanskii’s proposal included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on Esperanto. == Logical period == In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human. In 1957, Noam Chomsky’s Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule-based system of syntactic structures. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first statistical machine translation systems were developed. Some notably successful NLP systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies. In 1969 Roger Schank introduced the conceptual dependency theory for natural language understanding. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many chatterbots were written including PARRY, Racter, and Jabberwacky. == Statistical period == Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of machine learning algorithms for language processing. This was due both to the steady increase in computational power resulting from Moore's law and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks. === Datasets === The emergence of statistical approaches was aided by both increase in computing power and the availability of large datasets. At that time, large multilingual corpora were starting to emerge. Notably, some were produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. Many of the notable early successes occurred in the field of machine translation. In 1993, the IBM alignment models were used for statistical machine translation. Compared to previous machine translation systems, which were symbolic systems manually coded by computational linguists, these systems were statistical, which allowed them to automatically learn from large textual corpora. Though these systems do not work well in situations where only small corpora is available, so data-efficient methods continue to be an area of research and development. In 2001, a one-billion-word large text corpus, scraped from the Internet, referred to as "very very large" at the time, was used for word disambiguation. To take advantage of large, unlabelled datasets, algorithms were developed for unsupervised and self-supervised learning. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results. == Neural period == Neural language models were developed in 1990s. In 1990, the Elman network, using a recurrent neural network, encoded each word in a training set as a vector, called a word embedding, and the whole vocabulary as a vector database, allowing it to perform such tasks as sequence-predictions that are beyond the power of a simple multilayer perceptron. A shortcoming of the static embeddings was that they didn't differentiate between multiple meanings of homonyms. Yoshua Bengio developed the first neural probabilistic language model in 2000. Novel algorithms, availability of larger datasets and higher processing power made possible training of larger and larger language models. Attention mechanism was introduced by Bahdanau et al. in 2014. This work laid the foundations for the famous "Attention Is All You Need" paper that introduced the Transformer architecture in 2017. The concept of large language model (LLM) emerged in late 2010s. LLM is a language model trained with self-supervised learning on vast amount of text. Earliest public LLMs had hundreds of millions of parameters, but this number quickly rose to billion and even trillions. In recent years, advancements in deep learning and large language models have significantly enhanced the capabilities of natural language processing, leading to widespread applications in areas such as healthcare, customer service, and content generation. == Software ==

    Read more →
  • Logogen model

    Logogen model

    The logogen model of 1969 is a model of speech recognition that uses units called "logogens" to explain how humans comprehend spoken or written words. Logogens are a vast number of specialized recognition units, each able to recognize one specific word. This model provides for the effects of context on word recognition. == Overview == The word logogen can be traced back to the Greek-language word logos, which means "word", and genus, which means "birth". British scientist John Morton's logogen model was designed to explain word recognition using a new type of unit known as a logogen. A critical element of this theory is the involvement of lexicons, or specialized aspects of memory that include semantic and phonemic information about each item that is contained in memory. A given lexicon consists of many smaller, abstract items known as logogens. Logogens contain a variety of properties about given word such as their appearance, sound, and meaning. Logogens do not store words within themselves, but rather they store information that is specifically necessary for retrieval of whatever word is being searched for. A given logogen will become activated by psychological stimuli or contextual information (words) that is consistent with the properties of that specific logogen and when the logogen's activation level rises to or above its threshold level, the pronunciation of the given word is sent to the output system. Certain stimuli can affect the activation levels of more than one word at a time, usually involving words that are similar to one another. When this occurs, whichever of the words' activation levels reaches the threshold level, it is that word that is then sent to the output system with the subject remaining unaware of any partially excited logogens. This assumption was made by Marslen-Wilson and Welch (1978), who added to the model some assumptions of their own in order to account for their experimental results. They also assumed that the analysis of phonetic input can only become available to other parts of the system by process of how the input affects the logogen system. Finally, Marslen-Wilson and Welch assume that the first syllable of a given word will increase the activation level of a given logogen more than those of the latter syllables, which supported the data found at the time. == Analysis == The logogen model can be used to help linguists explain particular occurrences in the human language. The most-helpful application of the model is to show how one accesses words and their meanings in the lexicon. The word-frequency effect is best explained by the logogen model in that words (or logogens) that have a higher frequency (or are more common) have a lower threshold. This means that they require less perceptual power in the brain to be recognized and decoded from the lexicon and are recognized faster than those words that are less common. Also, with high-frequency words, the recovery from lowering the item's threshold is less fulfilled compared to low-frequency words so less sensory information is needed for that particular item's recognition. There are ways to lower thresholds, such as repetition and semantic priming. Also, each time a word is encountered through these methods, the threshold for that word is temporarily lowered partially because of its recovering ability. This model also conveys that specific concrete words are recalled better because they use images and logogens, whereas abstract words are not as easily recalled well because they only use logogens, hence showing the difference in thresholds between these two types of words. At the time of its conception, Morton's logogen model was one of the most influential models in springing up other parallel word access models and served as the essential basis for these subsequent models. Morton's model also strongly influenced other contemporary theories on lexical access. However, despite the advantages that the logogen theory presents, it also displays some negative facets. First and foremost, the logogen model does not explain all occurrences in language, such as the introduction of new words or non-words into a person's lexicon. Also, because of the distinctive model application, it may vary in its effectiveness in different languages. == Criticisms == While this model does a reasonable job of understanding the underlying semantics of many aspects in psycholinguistics, there are some flaws that have been pointed out in the logogen model. It has been argued that the prior stimulus patterns that have been seen in the logogen theory are not centrally localized in the logogen itself but are actually distributed throughout the different pathways over which the stimulus is being processed. What this directs at is that the notion and proliferation of logogens was due to modality. In essence, the logogen is unnecessary in the idea of attaining the title of being a recognition unit because of the variety of pathways that it is open to, not just logogens. Another criticism has been that this model essentially ignores larger and more critical structures in language and phonetics such as the different syntactic rules or grammatical construction that innately exists in language. Since this model overtly limits itself to the scope of lexical access then this model is seen as biased and misunderstood. To many psychologists, the logogen model does not meet the functional or representational adequacy that a theory should include to sufficiently comprehend language. Also, another criticism is that the logogen theory was supposed to predict that stimulus degradation should affect priming and word frequency in humans. However, many psychologists have conducted studies and researched the model to show that only priming and not word frequency is interacted with stimulus degradation. Priming is supposed to deteriorate a stimulus because it postulates that the semantic characteristics of previously known words are fed back into the detector of a person which in turn raises the threshold of related items. In word frequency, stimulus degradation is supposed to occur because it postulates that familiar words have lower thresholds than their low-frequency counterparts. However, in studies, priming is the only structure that does show observable and notable stimulus decadence. Even though the logogen theory has many unfilled holes, Morton was a revolutionary of his field whose speculation and research has opened up a remarkable era of psycholinguistics. == Other models to consider == cohort model – This model was proposed by Marslen-Wilson and was designed specifically to account for auditory word recognition. It works by breaking the word down and states that when a word is heard all words that begin with the first sound of the target word are activated. This set of words is considered the cohort. Once the first cohort has been activated, the other information, or sounds in the word narrow down the choices. The person recognizes the word when you are left with a single choice; this is considered the "recognition point". checking model – This model was developed by Norris in 1986. In this particular model, he took the approach that any word that partially matches the input is analyzed and checked to see if it fits with the context of the situation. interactive-activation model – This model is considered a connectionist model. Proposed by McClelland and Rumelhart in the 1981 to 1982 period, it is based around nodes, which are visual features, and positions of letters within a given word. They also act as word detectors which have inhibitory and excitatory connections between them. This model starts with first letter and suggests that all the words with that first letter are activated at first and then going through the word one can determine what the word is they are looking at. The main principle is that mental phenomena can be described by interconnected networks of simple units. verification model – The model was developed by Curtis Becker in 1970. The main idea is that a small number of candidates that are activated in parallel are subject to a serial-verification process. This model starts the word-recognition process with a basic representation of the stimulus. Then, sensory trace, consisting of line features is used to activate word detectors. When an acceptable number of detectors are activated these are used to generate a search set. These items are drawn from the lexicon on the basis of similarity to the sensory trace, which help with the identity of the stimulus. Then, in a serial process the candidates are compared to the representation of the sensory-trace input. == Related concepts == word frequency – This is the belief that the speed and accuracy with which a word is recognized is related to how frequently the word occurs in our language. Each logogen has a threshold (for identification) and words with higher frequencies have lower thresholds. Words with higher freq

    Read more →
  • Frankenstein complex

    Frankenstein complex

    The Frankenstein complex is a term coined by Isaac Asimov in his robot series, referring to the fear of mechanical men. == History == Some of Asimov's science fiction short stories and novels predict that this suspicion will become strongest and most widespread in respect of "mechanical men" that most-closely resemble human beings (see android), but it is also present on a lower level against robots that are plainly electromechanical automatons. The "Frankenstein complex" is similar in many respects to Masahiro Mori's uncanny valley hypothesis. The name, "Frankenstein complex", is derived from the name of Victor Frankenstein in the 1818 novel Frankenstein; or, The Modern Prometheus by Mary Shelley. In Shelley's story, Frankenstein created an intelligent, somewhat superhuman being, but he finds that his creation is horrifying to behold and abandons it. This ultimately leads to Victor's death at the conclusion of a vendetta between himself and his creation. In much of his fiction, Asimov depicts the general attitude of the public towards robots as negative, with ordinary people fearing that robots will either replace them or dominate them, although dominance would not be allowed under the specifications of the Three Laws of Robotics, the first of which is: "A robot may not harm a human being or, through inaction, allow a human being to come to harm." However, Asimov's fictitious earthly public is not fully persuaded by this, and remains largely suspicious and fearful of robots. I, Robot's short story "Little Lost Robot" is about this "fear of robots". In Asimov's robot novels, the Frankenstein complex is a major problem for roboticists and robot manufacturers. They do all they can to reassure the public that robots are harmless, even though this sometimes involves hiding the truth because they think that the public would misunderstand it. The fear by the public and the response of the manufacturers is an example of the theme of paternalism, the dread of paternalism, and the conflicts that arise from it in Asimov's fiction. The same theme occurs in many later works of fiction featuring robots, although it is rarely referred to as such.

    Read more →
  • Articulatory speech recognition

    Articulatory speech recognition

    Articulatory speech recognition means the recovery of speech (in forms of phonemes, syllables or words) from acoustic signals with the help of articulatory modeling or an extra input of articulatory movement data. Speech recognition (or automatic speech recognition, acoustic speech recognition) means the recovery of speech from acoustics (sound wave) only. Articulatory information is extremely helpful when the acoustic input is in low quality, perhaps because of noise or missing data. Measurable information from the articulatory system (e.g. tongue, jaw movements) can supplement acoustic signals to improve phone recognition accuracy by 2%. However, attempts to estimate articulatory data from acoustic signals alone have not significantly enhanced recognition performance.

    Read more →
  • Podium (company)

    Podium (company)

    Podium is a private technology company headquartered in Lehi, Utah that develops cloud-based software related to messaging, customer feedback, online reviews, selling products, and requesting payments. == History == Podium was founded in 2014 by Eric Rea and Dennis Steele, who developed a tool to help small businesses "build their online reputation" through online reviews. Podium was initially known as RepDrive before rebranding as Podium in 2015. In 2015, Podium moved from a spare bedroom to a new location above a Provo bike shop. In March 2020, Podium added payments technology to its product suite. In November 2021, Podium raised $201 million in Series D funding and was valued at $3 billion. == Product == Podium is a software-as-a-service platform designed to improve business online reputation. It helps users manage business interactions in one tool. Users can communicate reviews, texts, chats, and post payment directly within the app.

    Read more →
  • Pandemonium architecture

    Pandemonium architecture

    Pandemonium architecture is a theory in cognitive science that describes how visual images are processed by the brain. It has applications in artificial intelligence and pattern recognition. The theory was introduced by the artificial intelligence pioneer Oliver Selfridge in his 1959 paper "Pandemonium - A Paradigm for Learning". It describes the process of object recognition as the exchange of signals within a hierarchical system of detection and association, the elements of which Selfridge metaphorically termed "demons". This model is now recognized as the basis of visual perception in cognitive science. Pandemonium architecture arose in response to the inability of template matching theories to offer a biologically plausible explanation of the image constancy phenomenon. Contemporary researchers praise this architecture for its elegancy and creativity; that the idea of having multiple independent systems (e.g., feature detectors) working in parallel to address the image constancy phenomena of pattern recognition is powerful yet simple. The basic idea of the pandemonium architecture is that a pattern is first perceived in its parts before the "whole". Pandemonium architecture was one of the first computational models in pattern recognition. Although not perfect, the pandemonium architecture influenced the development of modern connectionist, artificial intelligence, and word recognition models. == History == Most research in perception has been focused on the visual system, investigating the mechanisms of how we see and understand objects. A critical function of our visual system is its ability to recognize patterns, but the mechanism by which this is achieved is unclear. The earliest theory that attempted to explain how we recognize patterns is the template matching model. According to this model, we compare all external stimuli against an internal mental representation. If there is "sufficient" overlap between the perceived stimulus and the internal representation, we will "recognize" the stimulus. Although some machines follow a template matching model (e.g., bank machines verifying signatures and accounting numbers), the theory is critically flawed in explaining the phenomena of image constancy: we can easily recognize a stimulus regardless of the changes in its form of presentation (e.g., T and T are both easily recognized as the letter T). It is highly unlikely that we have a stored template for all of the variations of every single pattern. As a result of the biological plausibility criticism of the template matching model, feature detection models began to rise. In a feature detection model, the image is first perceived in its basic individual elements before it is recognized as a whole object. For example, when we are presented with the letter A, we would first see a short horizontal line and two slanted long diagonal lines. Then we would combine the features to complete the perception of A. Each unique pattern consists of different combination of features, which means those that are formed with the same features will generate the same recognition. That is, regardless of how we rotate the letter A, is still perceived as the letter A. It is easy for this sort of architecture to account for the image constancy phenomena because you only need to "match" at the basic featural level, which is presumed to be limited and finite, thus biologically plausible. The best known feature detection model is called the pandemonium architecture. == Pandemonium architecture == The pandemonium architecture was originally developed by Oliver Selfridge in the late 1950s. The architecture is composed of different groups of "demons" working independently to process the visual stimulus. Each group of demons is assigned to a specific stage in recognition, and within each group, the demons work in parallel. There are four major groups of demons in the original architecture. The concept of feature demons, that there are specific neurons dedicated to perform specialized processing is supported by research in neuroscience. Hubel and Wiesel found there were specific cells in a cat's brain that responded to specific lengths and orientations of a line. Similar findings were discovered in frogs, octopuses and a variety of other animals. Octopuses were discovered to be only sensitive to verticality of lines, whereas frogs demonstrated a wider range of sensitivity. These animal experiments demonstrate that feature detectors seem to be a very primitive development. That is, it did not result from the higher cognitive development of humans. Not surprisingly, there is also evidence that the human brain possesses these elementary feature detectors as well. Moreover, this architecture is capable of learning, similar to a back-propagation styled neural network. The weight between the cognitive and feature demons can be adjusted in proportion to the difference between the correct pattern and the activation from the cognitive demons. To continue with our previous example, when we first learned the letter R, we know is composed of a curved, long straight, and a short angled line. Thus when we perceive those features, we perceive R. However, the letter P consists of very similar features, so during the beginning stages of learning, it is likely for this architecture to mistakenly identify R as P. But through constant exposure of confirming R's features to be identified as R, the weights of R's features to P are adjusted so the P response becomes inhibited (e.g., learning to inhibit the P response when a short angled line is detected). In principle, a pandemonium architecture can recognize any pattern. As mentioned earlier, this architecture makes error predictions based on the amount of overlapping features. Such as, the most likely error for R should be P. Thus, in order to show this architecture represents the human pattern recognition system we must put these predictions into test. Researchers have constructed scenarios where various letters are presented in situations that make them difficult to identify; then types of errors were observed, which was used to generate confusion matrices: where all of the errors for each letter are recorded. Generally, the results from these experiments matched the error predictions from the pandemonium architecture. Also as a result of these experiments, some researchers have proposed models that attempted to list all of the basic features in the Roman alphabet. == Criticism == A major criticism of the pandemonium architecture is that it adopts a completely bottom-up processing: recognition is entirely driven by the physical characteristics of the targeted stimulus. This means that it is unable to account for any top-down processing effects, such as context effects (e.g., pareidolia), where contextual cues can facilitate (e.g., word superiority effect: it is relatively easier to identify a letter when it is part of a word than in isolation) processing. However, this is not a fatal criticism to the overall architecture, because is relatively easy to add a group of contextual demons to work along with the cognitive demons to account for these context effects. Although the pandemonium architecture is built on the fact that it can account for the image constancy phenomena, some researchers have argued otherwise; and pointed out that the pandemonium architecture might share the same flaws from the template matching models. For example, the letter H is composed of 2 long vertical lines and a short horizontal line; but if we rotate the H 90 degrees in either direction, it is now composed of 2 long horizontal lines and a short vertical line. In order to recognize the rotated H as H, we would need a rotated H cognitive demon. Thus we might end up with a system that requires a large number of cognitive demons in order to produce accurate recognition, which would lead to the same biological plausibility criticism of the template matching models. However, it is rather difficult to judge the validity of this criticism because the pandemonium architecture does not specify how and what features are extracted from incoming sensory information, it simply outlines the possible stages of pattern recognition. But of course that raises its own questions, to which it is almost impossible to criticize such a model if it does not include specific parameters. Also, the theory appears to be rather incomplete without defining how and what features are extracted, which proves to be especially problematic with complex patterns (e.g., extracting the weight and features of a dog). Some researchers have also pointed out that the evidence supporting the pandemonium architecture has been very narrow in its methodology. Majority of the research that supports this architecture has often referred to its ability to recognize simple schematic drawings that are selected from a small finite set (e.g., letters in the Roman alphabet). Evidence from these types of exper

    Read more →
  • CinePlayer

    CinePlayer

    CinePlayer is a software based media player used to review Digital Cinema Packages (DCP) without the need for a digital cinema server by Doremi Labs. CinePlayer can play back any DCP, not just those created by Doremi Mastering products. In addition to playing DCPs, CinePlayer can also playback JPEG2000 image sequences and many popular multimedia file types. There are two versions of CinePlayer available, standard and Pro. The standard version supports playback of non-encrypted, 2D DCP's up to 2K resolution. The Pro version supports playback of encrypted, 2D or 3D DCP's with subtitles up to 4K resolution. == Supported formats == === Containers === AVI MOV MXF MPG TS WMV M2TS MTS MP4 MKV === Video codecs === JPEG2000 ProRes 422 DNxHD YUV Uncompressed 8-10 bits DIVX XVID MPEG4 AVC / H-264 VC-1 MPEG2 === Supported image sequences === BMP TIFF TGA DPX JPG J2C === Supported audio files === WAV MP3 WMA MP2

    Read more →
  • Frameserver

    Frameserver

    A frameserver is any program that acts as a media source in the process called frameserving, which transfers digital video data from one computer program to another without intermediate files. The program that receives the data – the frameclient – could be any type of video application. The process is controlled by the frameclient: the frameclient requests audio/video frames and the frameserver serves them. The client can request frames in any order, allowing it to pause or jump to an arbitrary frame, just as a media player does with a file on disk. The client is most commonly a media encoder, a non-linear editing system, or a media player. == Frameservers == AviSynth VirtualDub VapourSynth Debugmode FrameServer

    Read more →
  • Automated storage and retrieval system

    Automated storage and retrieval system

    An automated storage and retrieval system (ASRS or AS/RS) consists of a variety of computer-controlled systems for automatically placing and retrieving loads from defined storage locations. Automated storage and retrieval systems (AS/RS) are typically used in applications where: There is a very high volume of loads being moved into and out of storage Storage density is important because of space constraints No value is added in this process (no processing, only storage and transport) Accuracy is critical because of potential expensive damages to the load An AS/RS can be used with standard loads as well as nonstandard loads, meaning that each standard load can fit in a uniformly-sized volume; for example, the film canisters in the image of the Defense Visual Information Center are each stored as part of the contents of the uniformly sized metal boxes, which are shown in the image. Standard loads simplify the handling of a request of an item. In addition, audits of the accuracy of the inventory of contents can be restricted to the contents of an individual metal box, rather than undergoing a top-to-bottom search of the entire facility, for a single item. They can also be used in self storage places. == Overview == AS/RS systems are designed for automated storage and retrieval of parts and items in manufacturing, distribution, retail, wholesale and institutions. They first originated in the 1960s, initially focusing on heavy pallet loads but with the evolution of the technology the handled loads have become smaller. The systems operate under computerized control, maintaining an inventory of stored items. Retrieval of items is accomplished by specifying the item type and quantity to be retrieved. The computer determines where in the storage area the item can be retrieved from and schedules the retrieval. It directs the proper automated storage and retrieval machine (SRM) to the location where the item is stored and directs the machine to deposit the item at a location where it is to be picked up. A system of conveyors and or automated guided vehicles is sometimes part of the AS/RS system. These take loads into and out of the storage area and move them to the manufacturing floor or loading docks. To store items, the pallet or tray is placed at an input station for the system, the information for inventory is entered into a computer terminal and the AS/RS system moves the load to the storage area, determines a suitable location for the item, and stores the load. As items are stored into or retrieved from the racks, the computer updates its inventory accordingly. The benefits of an AS/RS system include reduced labor for transporting items into and out of inventory, reduced inventory levels, more accurate tracking of inventory, and space savings. Items are often stored more densely than in systems where items are stored and retrieved manually. Within the storage, items can be placed on trays or hang from bars, which are attached to chains/drives in order to move up and down. The equipment required for an AS/RS include a storage & retrieval machine (SRM) that is used for rapid storage and retrieval of material. SRMs are used to move loads vertically or horizontally, and can also move laterally to place objects in the correct storage location. The trend towards Just In Time production often requires sub-pallet level availability of production inputs, and AS/RS is a much faster way of organizing the storage of smaller items next to production lines. The Material Handling Institute of America (MHIA), the non-profit trade association for the material handling world, and its members have categorised AS/RS into two primary segments: Fixed Aisle and Carousels/Vertical Lift Modules (VLMs). Both sets of technologies provide automated storage and retrieval for parts and items, but use different technologies. Each technology has its unique set of benefits and disadvantages. Fixed Aisle systems are characteristically larger systems whereas carousels and Vertical Lift Modules are used individually or grouped, but in small to medium-sized applications. A fixed-aisle AS/R machine (stacker crane) is one of two main designs: single-masted or double masted. Most are supported on a track and ceiling guided at the top by guide rails or channels to ensure accurate vertical alignment, although some are suspended from the ceiling. The 'shuttles' that make up the system travel between fixed storage shelves to deposit or retrieve a requested load (ranging from a single book in a library system to a several ton pallet of goods in a warehouse system). The entire unit moves horizontally within an aisle, while the shuttles are able to elevate up to the necessary height to reach the load, and can extend and retract to store or retrieve loads that are several positions deep in the shelving. A semi-automated system can be achieved by utilizing only specialized shuttles within an existing rack system. Another AS/RS technology is known as shuttle technology. In this technology the horizontal movement is made by independent shuttles each operating on one level of the rack while a lift at a fixed position within the rack is responsible for the vertical movement. By using two separate machines for these two axes the shuttle technology is able to provide higher throughput rates than stacker cranes. Storage and Retrieval Machines pick up or drop off loads to the rest of the supporting transportation system at specific stations, where inbound and outbound loads are precisely positioned for proper handling. In addition, there are several types of Automated Storage & Retrieval Systems (AS/RS) devices called Unit-load AS/RS, Mini-load AS/RS, Mid-Load AS/RS, Vertical Lift Modules (VLMs), Horizontal Carousels and Vertical Carousels. These systems are used either as stand-alone units or in integrated workstations called pods or systems. These units are usually integrated with various types of pick to light systems and use either a microprocessor controller for basic usage or inventory management software. These systems are ideal for increasing space utilization up to 90%, productivity levels by 90%, accuracy to 99.9%+ levels and throughput up to 750 lines per hour/per operator or more depending on the configuration of the system. == Horizontal carousels == Robotic Inserter/Extractor devices can be used for horizontal carousels. The robotic device is positioned in the front or rear of up to three horizontal carousels tiered high. The robot grabs the tote required in the order and often replenishes at the same time to speed up throughput. The tote(s) are then delivered to a conveyor, which routes it to a work station for picking or replenishing. Up to eight transactions per minute per unit can be done. Totes or containers up to 36" x 36" x 36" can be used in a system. On a simplistic level, horizontal carousels are also often used as "rotating shelving". With simple "fetch" command, items are brought to the operator and otherwise wasted space is eliminated. AS/RS Applications: Most applications of AS/RS technology have been associated with warehousing and distribution operations. An AS/RS can also be used to store raw materials and work in process in manufacturing. Three application areas can be distinguished for AS/RS: (1) Unit load storage and handling, (2) Order picking, and (3) Work in process storage. Unit load storage and retrieval applications are represented by unit load AS/RS and deep-lane storage systems. These kinds of applications are commonly found in warehousing for finishing goods in a distribution center, rarely in manufacturing. Deep-lane systems are used in the food industry. As described above, order picking involves retrieving materials in less than full unit load quantities. Minilpass, man-on board, and items retrieval systems are used for this second application area. Work in process storage is a more recent application of automated storage technology. While it is desirable to minimize the amount of work in process, WIP is unavoidable and must be effectively managed. Automated storage systems, either automated storage/retrieval systems or carousel systems, represent an efficient way to store materials between processing steps, particularly in batch and job shop production. In high production, work in process is often carried between operations by conveyor system, which this serve both storage and transport functions. === Inventory Category-specific AS/RS === Each inventory category—raw materials, work-in-process, and finished goods—requires its own specialized Automated Storage and Retrieval System (AS/RS). Particularly for work-in-process (WIP) inventories, due to variations in manufacturing processes, the AS/RS systems are significantly different in design and function, tailored specifically to match unique handling, storage, and retrieval requirements === Installed applications === Installed applications of this technology can be wide-ranging. In some librarie

    Read more →
  • Memory color effect

    Memory color effect

    The memory color effect is the phenomenon that the canonical hue of a type of object acquired through experience (e.g. the sky, a leaf, or a strawberry) can directly modulate the appearance of the actual colors of objects. Human observers acquire memory colors through their experiences with instances of that type. For example, most human observers know that an apple typically has a reddish hue; this knowledge about the canonical color which is represented in memory constitutes a memory color. As an example of the effect, normal human trichromats, when presented with a gray banana, often perceive the gray banana as being yellow - the banana's memory color. In light of this, subjects typically adjust the color of the banana towards the color blue - the opponent color of yellow - when asked to adjust its surface to gray to cancel the subtle activation of banana's memory color. Subsequent empirical studies have also shown the memory color effect on man-made objects (e.g. smurfs, German mailboxes), the effect being especially pronounced for blue and yellow objects. To explain this, researchers have argued that because natural daylight shifts from short wavelengths of light (i.e., bluish hues) towards light of longer wavelengths (i.e., yellowish-orange hues) during the day, the memory colors for blue and yellow objects are recruited by the visual system to a higher degree to compensate for this fluctuation in illumination, thereby providing a stronger memory color effect. == Form identification == Memory color plays a role when detecting an object. In a study where participants were given objects, such as an apple, with two alternate forms for each, a crooked apple and a circular apple, researchers changed the colors of the alternate forms and asked if they could identify them. Most of the participants answered "unsure," suggesting that we use memory color when identifying an object. The research redefined memory color as a phenomenon when "a form's identity affects the phenomenal hue of that form." == Color effect on memorization == Memory color effect can be derived from the human instinct to memorize objects better. Comparing the effect of recognizing gray-scaled images and colored images, results showed that people were able to recall colored images 5% higher compared to gray-scaled images. An important factor was that higher level of contrast between the object and background color influences memory. In a specific study related to this, participants reported that colors were 5% to 10% easier to recognize compared to black and white. == Color constancy and memory color effect == Color constancy is the phenomenon where a surface to appear to be of the same color under a wide rage of illumination. A study tested two hypotheses with regards to color memory; the photoreceptor hypothesis and the surface reflectance hypothesis. The test color was surround either by various color patches forming a complex pattern or a uniform “grey” field at the same chromaticity as that of the illuminant. The test color was presented on a dark background for the control group. It was observed that complex surround results where in line with the surface-reflectance hypothesis and not the photoreceptor hypothesis, showing that the accuracy and precision of color memory are fundamentals to understanding the phenomenon of color constancy. == Significance to the evolution of trichromacy == While objects that possess canonical hues make up a small percentage of the objects which populate humans’ visual experience, the human visual system evolved in an environment populated with objects that possess canonical hues. This suggests that the memory color effect is related to the emergence of trichromacy because it has been argued that trichromacy evolved to optimize the ability to detect ripe fruits—objects that appear in canonical hues. == In perception research == In perception research, the memory color effect is cited as evidence for the opponent color theory, which states that four basic colors can be paired with its opponent color: red—green, blue—yellow. This explains why participants adjust the ripe banana color to a blueish tone to make its memory color yellow as gray. Researchers have also found empirical evidence that suggests memory color is recruited by the visual system to achieve color constancy. For example, participants had a lower percentage of color constancy when looking at a color incongruent scene, such as a purple banana, compared to a color diagnostical scene, a yellow banana. This suggests that color constancy is influenced by the color of objects that we are familiar with, which the memory color effect takes part.

    Read more →
  • Nuance Communications

    Nuance Communications

    Nuance Communications, Inc. was an American multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that markets speech recognition and artificial intelligence software. Nuance merged with its competitor in the commercial large-scale speech application business, ScanSoft, in October 2005. ScanSoft was a Xerox spin-off that was bought in 1999 by Visioneer, a hardware and software scanner company, which adopted ScanSoft as the new merged company name. The original ScanSoft had its roots in Kurzweil Computer Products. In April 2021, Microsoft announced it would buy Nuance Communications. The deal is an all-cash transaction of $19.7 billion, including company debt, or $56 per share. The acquisition was completed in March 2022. == History == The Speech Technology and Research (STAR) Laboratory at SRI International began the journey that, in 1994, resulted in a spin-off company; Corona Corporation (later renamed to Nuance Communications ). Nuance Communications (NUAN) went public on the Nasdaq Stock Market in 1995. Nuance focused on commercializing advanced speech recognition technologies. Nuance was an early spinoff of SRI's Speech Technology and Research (STAR) Laboratory, a world leader in audio processing, speech and speaker analytics and spoken language research. The technology that served as the foundation of Nuance's speech recognition solution started at the STAR Lab and helped launch Nuance more than 20 years ago. In 1995, The SRI Language Modeling Toolkit (SRILM) was developed. This provides the tools to build and apply statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation, and machine translation. In terms of commercialization of natural automated speech recognition, SRI's natural language speech recognition software was the first to be deployed by a major corporation. In 1996, Charles Schwab & Co., Inc., used Nuance's speech recognition technology to allow customers to receive stock quotes over the telephone. One of the key features of the ‘Schwab Discount Brokerage system’, was the ability to recognize English words even when spoken by customers with accents. In 1997, Nuance Communications developed the first large scale commercial dialog system for United Parcel Services (UPS). UPS used the voice recognition platform to handle very large numbers of inquiries about package status. The company that would later merge with Nuance Communications started life as Visioneer, incorporated in 1992. In 1999, Visioneer acquired ScanSoft, Inc. (SSFT), and the combined company became known as ScanSoft. In September 2005, ScanSoft Inc. acquired and merged with Nuance Communications (NUAN), a natural language DOD-project spinoff from SRI International. The resulting company adopted the Nuance name. During the prior decade, the two companies competed in the commercial large-scale speech application business. === Data breach === Between 2014 and 2017, Nuance exposed over 45,000 patient records. == Solutions == Customer service virtual assistants Speech recognition — for people Speech recognition — for business Speech recognition — for physicians Accessibility Power PDF Managed Print Services Transcription === ScanSoft origins === In 1974, Raymond Kurzweil founded Kurzweil Computer Products, Inc. to develop the first omni-font optical character-recognition system – a computer program capable of recognizing text written in any normal font. In 1980, Kurzweil sold his company to Xerox. The company became known as Xerox Imaging Systems (XIS), and later ScanSoft. In March 1992, a new company called Visioneer, Inc. was founded to develop scanner hardware and software products, such as a sheetfed scanner called PaperMax and the document management software PaperPort. Visioneer eventually sold its hardware division to Primax Electronics, Ltd. in January 1999. Two months later, in March, Visioneer acquired ScanSoft from Xerox to form a new public company with ScanSoft as the new company-wide name. Prior to 2001, ScanSoft focused primarily on desktop imaging software such as TextBridge, PaperPort and OmniPage. Beginning with the December 2001 acquisition of Lernout & Hauspie assets, the company moved into the speech recognition business and began to compete with Nuance. Lernout & Hauspie had acquired speech recognition company Dragon Systems in June 2001, shortly before becoming bankrupt in October. Scansoft acquired speech recognition company SpeechWorks in 2003. === Partnership with Siri and Apple Inc. === In 2013, Nuance confirmed that its natural language processing algorithms supported Apple's Siri voice assistant. === Focus on health care === In 2019, Nuance spun off its automotive division as the company Cerence, allowing it to focus on health care applications. === Acquisition by Microsoft === On April 12, 2021, Microsoft announced that it would buy Nuance Communications for $19.7 billion, or $56 a share, a 22% increase over the previous closing price. Nuance's CEO, Mark Benjamin, stayed with the company. This was Microsoft's second-biggest acquisition up to that point, after its purchase of LinkedIn for $24 billion (~$30.7 billion in 2024) in 2016. Shortly after the deal, the Competition and Markets Authority, a UK regulatory body, stated it was looking into the deal on the basis of antitrust concerns. In December 2021, it was reported that the deal would be approved by the European Union. The acquisition was completed on March 4, 2022. In May 2023, Nuance announced an unspecified number of layoffs.

    Read more →
  • Butler in a Box

    Butler in a Box

    Butler in a Box was an early voice-controlled home automation device developed in 1983 by magician Gus Searcy and programmer Franz Kavan. The device allowed users to control various home electronics, such as lights and phones, using voice commands. It predated modern smart speakers and virtual assistants by several decades. == History == The idea for the Butler in a Box originated in 1983 when Searcy was asked by friends why he couldn't simply command lights to turn on and off if he could pull rabbits out of hats, given his background as a professional magician. Searcy partnered with former IBM programmer Kavan to develop the device, with their first prototype being named "Sidney". The Butler in a Box combined remote control technology with voice recognition to enable control of home devices. However, it faced challenges due to the technological limitations of the era and its high price point of nearly $1,500 (equivalent to around $3,700 in 2021). == Features and functionality == Users could activate the Butler in a Box by speaking a wake word, typically a traditional butler name, and the device would address the user as "boss". It was capable of performing tasks such as: Turning lights on and off, controlling individual zones if lights were connected to remote control modules Making and receiving phone calls Setting timers Pairing with sensors to function as a security alarm system However, the device required extensive voice training for each user, a time-consuming process compared to modern voice recognition. Additionally, settings and trained commands would be lost if power was out for over 3 hours due to the volatile memory technology used at the time. == Reception and legacy == While innovative for its time, the Butler in a Box did not achieve widespread commercial success due to its high price and the technical limitations of the 1980s. Nevertheless, it served as an important early step in the development of home automation and showcased the potential for voice-controlled technology to enhance accessibility and convenience in the home. Decades later, products like Amazon Alexa, Google Home, and Apple's Siri would make voice-controlled smart home devices commonplace and affordable, building on the groundwork laid by early attempts like the Butler in a Box.

    Read more →
  • Gas (app)

    Gas (app)

    Gas (sometimes stylized in all caps), formerly known as Melt as well as Crush, was an American anonymous social media app. Launched in August 2022, the app is oriented towards high schoolers. The app was developed by Nikita Bier, Isaiah Turner, and former Facebook engineer Dave Schatz. Gas was largely based upon the prior tbh app developed by co-founder Nikita Bier, along with Erik Hazzard, Kyle Zaragoza, and Nicolas Ducdodon in September 2017. tbh was acquired by Facebook inc. (now Meta Platforms) on October 16, 2017, and nearly a year later in July 2018 was dissolved, owing to low usage. Gas follows a similar purpose to tbh in being a social media app oriented towards high schoolers. In the app, users participate in anonymous polls regarding pre-written complimentary statements to their peers, such as "I'd say yes if (blank) asked me out on a date," "I think (blank) is the coolest kid in school," or "would make an ugly face and still look pretty." Winners of said polls receive a "flame." The name of the app is derived from this, with "gassing someone up" being Gen Z slang for complimenting someone. Users can pay a $6.99 subscription that enables "God Mode," which shows hints regarding who voted for them in a poll. Gas overtook TikTok and BeReal as the most downloaded app on the Apple App Store in October 2022 (the app is currently not available for Android). The app has over 5.1 million downloads as of early November 2022, over a million active users and 300 thousand daily downloads as of October 2022. Currently, the app is available in Canada and the majority of the United States. On January 17, 2023, Gas was acquired by Discord, however it would remain a standalone app and its developers became Discord staff members. On October 18, 2023, Discord announced that service for Gas would be permanently ending effective November 7, 2023, due to a steep decline in users. Effective November 7, the app became completely unusable. == Controversy regarding human-trafficking == Beginning in October 2022, rumors spread largely throughout TikTok and Snapchat alleged that the app was linked to human trafficking (in particular sex trafficking). According to Bier, the rumor originated with a single user review from China on October 5, and then was disseminated through TikTok accounts with "few to no US teen followers." Although largely dismissed as a hoax by experts, who cite how the app doesn't log user locations and general anonymity, the hoax became pervasive to the extent that various police departments, school systems, and local news outlets began issuing warnings regarding the app. For instance, on October 31, 2022, the police department of Piedmont, Oklahoma issued a warning to parents, encouraging them to check their children's phones, while on November 3, the Oklahoma Oktaha Public School system stated in a Facebook post that "Children are being kidnapped in other towns and this new app is thought to be the source of predators finding their location." (both statements have since been retracted by Police Chief Scott Singer and Superintendent Jerry Needham respectively). Additionally, local medial outlets such as KOCO in Oklahoma City ran stories making similar statements. The rumor had a negative impact on the app, with downloads plateauing for a two-week period in late October and with 3% of users in a single day reportedly uninstalling the app. Revenue and ratings have also reportedly dropped and the company's social media accounts have been bombarded with comments labeling them as sex-traffickers. Additionally, the four-person development team has reportedly been bombarded with various death threats as a result.

    Read more →
  • BBC Own It

    BBC Own It

    The BBC Own It app was a British information site designed to protect and support children using the Internet. The app was launched in 2017 and retired in 2022, though the website retired in 2024 and has since moved to BBC Teach. As part of the BBC's partnership with Internet Matters, the not-for-profit contributed to content on the BBC Own It website. == History == In 2016, The Royal Foundation of The Duke and Duchess of Cambridge established The Royal Foundation Taskforce on the Prevention of Cyberbullying. Work began in 2017 by the BBC to create an app about cyberbullying and online safety (later titled Own It) in response to a call for action from the Taskforce. In December 2017, the BBC launched Own It. In November 2018, work on the BBC Own It App was announced by Prince William. In September 2019, the BBC Own It App was launched into the AppStore and Google Play. In 2022, the BBC discontinued the app, although the website was still active, however in 2024, the website was discontinued, and now any links to the website now redirect to a BBC Teach page. == Awards == UXUK award for Best Education or Learning Experience (2019) Banff World Media Festival Rockies Award for Children & Youth Interactive Content (2020) CogX Award for Best Innovation In Natural Language Processing (2020)

    Read more →
  • DeepRoute.ai

    DeepRoute.ai

    DeepRoute.ai (Chinese: 元戎启行) is a Chinese autonomous driving company founded in 2019 and headquartered in Shenzhen, China. The company develops full-stack self-driving solutions including perception, decision-making, and control systems. == History == DeepRoute.ai was founded in February 2019 in Shenzhen, China, by Zhou Guang (周光), who serves as the company's CEO. In September 2019, the company collaborated with Dongfeng for a live-streamed autonomous driving demonstration. In October 2019, during the 7th Military World Games, DeepRoute.ai conducted Robotaxi demonstration operations. In November 2019, it obtained an intelligent connected vehicle road test permit for public roads in Shenzhen. In October 2020, DeepRoute.ai signed an "Autonomous Driving Leadership Project" with Dongfeng to build one of China's largest autonomous fleets. In August 2020, DeepRoute.ai announced its partnership with Cao Cao Mobility, a Geely-backed ride-hailing company, to test Robotaxis in Hangzhou for daily operations, planning to provide Robotaxis during the 2022 Asian Games. In September 2021, DeepRoute.ai secured US$300 million in a Series B funding round led by Alibaba. In December 2021, the company unveiled its DeepRoute-Driver 2.0, an L4-level autonomous driving solution comprising five solid-state lidar sensors, eight cameras, a proprietary computing system and an optional millimeter-wave radar. with a production cost of under US$10,000. In June 2022, it partnered with Deppon Express to provide autonomous light truck freight transfer services. In March 2023, the company launched its high-precision map-free intelligent driving solution, DeepRoute-Driver 3.0. In November 2024, Great Wall Motor announced a $100 million Series C funding round for Deeproute. With this, Deeproute has completed five rounds of financing, raising a cumulative total of over $500 million. Its shareholders include Fosun RZ Capital, Yunqi Partners, Alibaba, Vision Plus Capital, and Dongfeng, among others. In the same month, Deeproute.ai emphasised that they were in "deep cooperation" with Nvidia and spoke on being part of the first batch of companies in China to get a hold of Nvidia's newer Thor chip for cars which will be used in a new system released next year. This new system will help manage more complex driving scenarios through visual cues. == Products == === VLA Model === VLA Model is a Vision–language–action model designed for autonomous driving systems. It integrates visual perception, semantic understanding, and action decision-making into a unified framework, aiming to enhance the safety and adaptability of advanced driver-assistance systems (ADAS) in complex road environments. The model was officially launched on August 26, 2025, as the core of DeepRoute.ai's DeepRoute IO 2.0 platform. The VLA model is characterized by its "visual-language-action" architecture, which incorporates a chain-of-thought (CoT) reasoning capability inspired by large language models. This design is intended to address the "black box" limitations of traditional end-to-end autonomous driving systems by enabling the model to analyze information, infer causality, and make decisions in a more transparent and interpretable manner. === Appliance === The company has partnered with several automakers including Dongfeng Motor Corporation and Geely to develop and test autonomous vehicles.

    Read more →