AI Art That Looks Real

AI Art That Looks Real — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • World model (artificial intelligence)

    World model (artificial intelligence)

    A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act without constant real-world trial and error. World models differ from systems that merely classify or generate outputs. They simulate dynamics such as physics, object interactions, and causality. Early ideas date to the 1990s. Modern versions power robots, autonomous driving, and interactive video generation. == History == Jürgen Schmidhuber introduced the term world model in machine learning in 1990. He proposed recurrent neural networks that predict future states from observations and use those predictions to train agents. David Ha and Schmidhuber revived the concept in a 2018 paper. Their agents learned to drive virtual cars and play video games inside self-generated simulations. Yann LeCun advanced the idea in a 2022 position paper titled "A Path Towards Autonomous Machine Intelligence". He argued that intelligence requires predictive models of the world rather than pure pattern matching. LeCun proposed the joint embedding predictive architecture (JEPA) as a practical foundation. LeCun and collaborators developed several JEPA variants. V-JEPA 2 reached state-of-the-art performance on video understanding and physical reasoning at the time. It supports zero-shot robot control in unfamiliar environments. Introduced in March 2026, LeWorldModel trains stably end-to-end from raw pixels and uses two loss terms and avoids hand-crafted heuristics. LeCun founded Advanced Machine Intelligence Labs in 2026 to further develop world models. Google DeepMind introduced Genie in 2024. The model learned interactive environments from unlabeled internet videos. Genie 2 followed in late 2024 and added three-dimensional generation. The Genie series set benchmarks for general-purpose simulation. Genie 3 was introduced in August 2025. It produces photorealistic, real-time interactive worlds from text prompts which are displayed at 24 frames per second and explored in real time with text or image prompts. The model supports persistent three-dimensional worlds and real-time interaction. Waymo adopted Genie 3 in February 2026 and used it to create a specialized world model for autonomous driving simulation, called the Waymo World Model. It produces synchronized camera and lidar outputs and creates edge cases that real robotaxis rarely encounter. The edge cases were reported to be unusual by PCMag. General Intuition announced a $133.7 million seed round. World Labs raised $1 billion. AMI raised $1.03 billion. In April 2026, Alibaba announced Happy Oyster, its world model designed for real-time and “flowy” world model. It includes a directing mode for world building based on text and image prompts and a wandering mode for exploring the resulting world. It can generate 3-minute in-world video clips. Also in April, World Labs, co-founded by Li Fei Fei, unveiled Spark 2.0, an open-source 3D Gaussian splatting rendering engine that targets smartphone-class devices. In June 2026, Nvidia released Cosmos 3, a family of open-weight models. It combines previously independent physical reasoning, world simulation, and action generation. Cosmos 3 integrates can process and generate text, image, video, audio, and action sequences. The model employs a Mixture-of-Transformers" (MoT) approach. An autoregressive (AR) transformer handles reasoning and next-token prediction, while a diffusion transformer (DT) does multimodal generation. Encoders (ViT for vision, VAE for visual/audio, and domain-specific for actions) and generate a shared representation space using 3D multi-dimensional rotary position embedding (mRoPE) for spatial and temporal information. The family includes Cosmos3-Nano (16B parameters) for workstations; Cosmos3-Super (64B parameters) for research. == Architecture == World models process raw sensory data such as video frames or lidar scans. They compress this input into compact latent representations. The system then predicts future representations rather than pixel-by-pixel reconstructions. Many modern world models use joint embedding predictive architecture (JEPA). An encoder turns observations into embeddings. A predictor estimates one or a suite of embeddings from the current one and an action. In some cases a critic chooses one embedding as the best result. A regularizer keeps embeddings well-behaved. The model trains by minimizing prediction error in embedding space. This approach avoids the high cost of generating every detail. Some architectures add explicit components. A fast reactive path handles immediate responses. A slower deliberative path performs longer-horizon planning. Video prediction accuracy or robot success rates are key metrics, but do not always predict real-world performance. Generative world models such as Genie 3 combine these with a simulator. They accept text prompts or layouts and output consistent video, lidar, or three-dimensional scenes. World models often train with self-supervised learning. They use large unlabeled datasets of video or robot interactions. Self-supervised learning can speed learning. Reinforcement learning can fine-tune a model for specific tasks. == Applications == World models support robot learning. Agents train inside simulations and transfer skills to the physical world. This reduces the need for dangerous or expensive real-world trials. Autonomous vehicles use world models to test rare events. Waymo's system simulates tornadoes or unusual pedestrian behavior. Companies train planners without putting vehicles on public roads. Interactive entertainment benefits from world models. Genie 3 lets users generate playable environments from simple descriptions. Game studios prototype levels faster. Scientific simulation gains from these models. Researchers model physical systems or biological processes at scale. Planners in logistics or urban design test strategies inside accurate digital twins. == Comparison with large language models == Both world models and large language models (LLMs) use inferencing on their inputs to make predictions. LLMs operate on textual inputs. They predict the next token in text sequences. They excel at language-oriented tasks such as translation or summarization. However, they lack understanding of physics. World models operate on sensor inputs such as pixels. They predict state changes in that data in latent space. This design supports planning and causal reasoning. LLMs generate fluent text but often fail at consistent physical predictions. Their architecture employs transformers with refinements such as mixture of experts. World models divide an inferencing task into work performed by encoders, predictors, simulators, and other pieces. They typically handle multimodal inputs such as video, lidar, radar, and audio, guided by textual prompting. LLMs power chatbots and code assistants. World models drive embodied agents that act in dynamic environments, such as autonomous driving. The two may be combined in hybrid systems. For example, a LLM handles instructions, while a world model manages low-level control. World model proponents such as LeCun claim that because LLMs are trained only on text, they have no ability to predict anything beyond text, such as real-world events. == Benchmarks == World model benchmarks test physical understanding, long-term consistency, planning, and generalization from sensor data. Meta introduced three benchmarks for V-JEPA 2. IntPhys 2 measures a model's ability to detect physics violations. It presents pairs of videos that diverge when one breaks physical rules. Humans score near 100% accuracy. V-JEPA 2 achieves little better than random chance on many conditions. Minimal Video Pairs (MVPBench) tests physical understanding through multiple-choice questions based on short video clips. It probes object interactions and causality. Something-Something tests action recognition. Epic-Kitchens-100 tests human action anticipation. DeepMind benchmark: Interactive evaluation measures consistency over minutes of interaction, memory of off-screen objects, and response to user actions or text prompts. Waymo benchmark: Output generation quality: Metrics include realism, controllability (via text prompts), and usefulness for training planners in simulated worlds. However, pixel reconstruction error rate with episodic rewards often fails. Other: Epic-Kitchens-100 (often measured with Recall@5) Ego4D 50 Salads, Breakfast, etc. Potential benchmarks: Zero-shot transfer to robots Long-horizon planning Implausible prediction rate

    Read more →
  • Advanced automation functions

    Advanced automation functions

    In automation production technology the actions performed by an automated process are executed by a program of instructions which is run during a work cycle. To execute work cycle programs, an automated system should be available to execute these advanced functions. == Safety monitoring == If there is a need for workers in an automated system, a safety monitoring is required for the occupational safety and health of the workers. In a safety monitoring various steps can take place including a complete stop of the system, sounding an alarm or reducing the operating speed. Usually, limiting switches are sensors like temperature probes, heat and smoke detectors or pressure sensitive floor pads. == Maintenance and repair diagnostics == There are three modes of operations which are used in a cycle of maintenance and repair diagnostics: status monitoring, failure diagnostics and recommendation of the repair procedure. In the status monitoring mode, the current system status is displayed. The failure diagnostics mode takes place when a failure occurs. The system will then suggest an adequate repair procedure to a team of experts. == Error detection and recovery == The error detection mode is a step to determine if and when a failure occurs in automated system. The possible errors can be divided into three categories. random errors, systematic errors and aberrations. While in the error recovery mode, remedy actions take place for all detected errors.

    Read more →
  • Linguatec

    Linguatec

    The Linguatec Sprachtechnologien GmbH is a language technology provider, specialized in the field of machine translation, speech synthesis and speech recognition. Linguatec was founded in Munich in 1996 and its headquarters are in Pasing. Linguatec has won the European Information Society Technologies Prize three times. On their website, they are now using the online service Voice Reader Web, so that the information can be read out in every language by means of a text-to-speech function. == Core areas == Machine translation The different versions of Personal Translator (seven language pairs) can be used "for home use" or for professional business use in the company network. In addition to this, specialist dictionaries are offered to broaden standard vocabulary. Speech synthesis The Voice Reader text-to-speech program reads in twelve languages: German, British English, American English, French, Quebec French, Spanish, Mexican Spanish, Italian, Dutch, Portuguese, Czech, Chinese. Speech recognition Voice Pro is based on ViaVoice technology from IBM. There are special software programs for doctors and lawyers. == Patents == 2005 pending patent application for a newly developed hybrid technology that uses the intelligence of neural networks for machine translation. == Awards == 2004 European IT Prize for Beyond Babel 2004 test winner Stiftung Warentest – best voice recognition 1998 European IT Prize – applied voice recognition 1996 European IT Prize – automated translation == Studies == 2005 University of Regensburg: Voice Reader user test 2002 Fraunhofer Institute for Industrial Engineering and Organization IAO: user study on the efficiency of machine translation

    Read more →
  • Avid Symphony

    Avid Symphony

    Avid Symphony is non-linear editing software aimed at professionals in the film and television industry. It is available for Microsoft Windows PCs and Apple Macintosh platforms. Symphony is Avid's high end SD/HD finishing platform for long form work, such as documentary and episodic TV. Its interface is based on the same look and feature set as the Media Composer and Xpress systems, but contains the highest level of features and resolution including secondary color correction, uncompressed HD, and higher real-time performance. == Release history == Symphony is the software component of a tightly integrated package that includes specific hardware audio/video interfaces, storage, and the computer, also sold by Avid. Its release history is therefore tightly related to the release of new Avid interface hardware: Symphony was introduced to the market in 1998. It was based on Avid's Meridien hardware, supporting SD only, and was available first only for the PC and later for the Macintosh platforms. Its last release was 5.0.5 which supported Windows 2000 and Mac OS X v10.2. The next major upgrade was Symphony Nitris in 2005, with a redesigned software and integration with the Nitris DNA hardware (PCI-X). It supported 8 bit and 10 bit SD and HD resolutions in both compressed and uncompressed forms, the MXF format and DNxHD codec, and ran only on Windows PC platforms. Symphony Nitris DX, released in 2008, added support for a range of HD codecs, including HDV, XDCAM-HD, DVCPRO HD, and AVC-I, and brought back Mac OS support for OS X 10.5, as well as Windows Vista. Since the introduction of Symphony 6, it can be used in software-only mode (where a Nitris or Nitris DX BOB used to be required), and at the same time, like Media Composer, Symphony was opened up with "Open I/O", allowing users to have Symphony use their third party hardware from companies like AJA, Matrox, BlueFish, Blackmagic Design and MOTU. The last remaining features that differentiate it from Media Composer are Advanced Color Correction (channels, secondary color correction,), Relational Color Correction (corrections based on common clip name, tape name, program track) and Universal HD Mastering (only with Nitris DX hardware). The latter allows cross-conversions of 23.976p or 24p projects sequences to most any other format during Digital Cut. In 2013, Avid announced it would no longer offer Symphony a standalone product. Starting version 7, Symphony will be sold as an option to Media Composer. This optional package (sold at a premium) will contain all the traditional Symphony-only features to any Media Composer install. == Use in movies == The Celibacy, Director: Horacio Bocaranda Avid Media Composer 6 and Avid Symphony 6 Nitris DX American Hardcore, Director: Paul Rachman Avid Xpress Pro and Symphony Summercamp!, Director: Spike Lee Avid Xpress Pro and Symphony When the Levees Broke Avid Media Composer and Symphony Nitris Superman Returns Edited with Mac-based Film Composer XL, but HD screenings prepped with Symphony

    Read more →
  • GeoNetwork opensource

    GeoNetwork opensource

    The GeoNetwork opensource (GNOS) project is a free and open source (FOSS) cataloging application for spatially referenced resources. It is a catalog of location-oriented information. == Outline == It is a standardized and decentralized spatial information management environment designed to enable access to geo-referenced databases, cartographic products and related metadata from a variety of sources, enhancing the spatial information exchange and sharing between organizations and their audience, using the capacities of the internet. Using the Z39.50 protocol it both accesses remote catalogs and makes its data available to other catalog services. As of 2007, OGC Web Catalog Service are being implemented. Maps, including those derived from satellite imagery, are effective communicational tools and play an important role in the work of decision makers (e.g., sustainable development planners and humanitarian and emergency managers) in need of quick, reliable and up-to-date user-friendly cartographic products as a basis for action and to better plan and monitor their activities; GIS experts in need of exchanging consistent and updated geographical data; and spatial analysts in need of multidisciplinary data to perform preliminary geographical analysis and make reliable forecasts. == Deployment == The software has been deployed to various organizations, the first being FAO GeoNetwork and WFP VAM-SIE-GeoNetwork, both at their headquarters in Rome, Italy. Furthermore, the WHO, CGIAR, BRGM, ESA, FGDC and the Global Change Information and Research Centre (GCIRC) of China are working on GeoNetwork opensource implementations as their spatial information management capacity. It is used for several risk information systems, in particular in the Gambia. Several related tools are packaged with GeoNetwork, including GeoServer. GeoServer stores geographical data, while GeoNetwork catalogs collections of such data.

    Read more →
  • Georges Giralt PhD Award

    Georges Giralt PhD Award

    The Georges Giralt PhD Award is a European scientific prize for extraordinary contributions to robotics. It is awarded yearly at the European Robotics Forum by euRobotics AISBL, a non-profit organisation based in Brussels with the objective of turning robotics beneficial for Europe’s economy and society. Georges Giralt received his PhD in 1958, from Paul Sabatier University, in the domain of electrical machines, and soon afterwards became a pioneer in robotics, in Europe and worldwide. He was especially instrumental in bringing in scientific foundations and methodology when the domain was still young, and a loose coupling of mechanical and electrical engineering, adopting the early results of automatic control. The high reputation of the Georges Giralt PhD Award is based on the prominent role of the awarding institution euRobotics. With more than 250 member organisations, euRobotics represents the academic and industrial robotics community in Europe. Moreover, it provides the European robotics community with a legal entity to engage in a public/private partnership with the European Commission. The award is covered by various media. Entitled for participation in the Georges Giralt PhD Award are all robotics-related dissertations which have been successfully defended at a European university. The US-American counterpart is the Dick Volz Award. == Award winners == 2026: Antonio González Morgado 2025: Erfan Shahriari 2024: Manuel Keppler 2023: Antonio Andriella, Ribin Balachandran 2022: Antonio Loquercio, Michael Lutter 2021: Giuseppe Averta, Bernd Henze 2020: Cosimo Della Santina 2019: Grazioso Stanislao, Teodor Tomic 2018: Frank Bonnet, Daniel Leidner 2017: Johannes Englsberger 2016: Alexander Dietrich, Mark Müller 2015: Jörg Stückler 2014: Manuel Catalano, Fabien Expert, Rainer Jaekel 2013: Jens Kober 2012: Sami Haddadin 2011: Mario Pratts 2010: Ludovic Righetti 2009: Alejandro-Dizan Vasquez-Govea 2008: Cyrill Stachniss, Eduardo Rocon 2007: Pierre Lamon 2006: Martijn Wisse 2005: Juan Andrade Cetto 2004: Gilles Duchemin 2003: Ralf Koeppe 2002: Gianluca Antonelli, Jens-Steffen Gutmann

    Read more →
  • Turret lathe

    Turret lathe

    A turret lathe is a form of metalworking lathe that is used for repetitive production of duplicate parts, which by the nature of their cutting process are usually interchangeable. It evolved from earlier lathes with the addition of the turret, which is an indexable toolholder that allows multiple cutting operations to be performed, each with a different cutting tool, in easy, rapid succession, with no need for the operator to perform set-up tasks in between (such as installing or uninstalling tools) or to control the toolpath. The latter is due to the toolpath's being controlled by the machine, either in jig-like fashion, via the mechanical limits placed on it by the turret's slide and stops, or via digitally-directed servomechanisms for computer numerical control lathes. The name derives from the way early turrets took the general form of a flattened cylindrical block mounted to the lathe's cross-slide, capable of rotating about the vertical axis and with toolholders projecting out to all sides, and thus vaguely resembled a swiveling gun turret. Capstan lathe is the usual name in the UK and Commonwealth, though the two terms are also used in contrast: see below, Capstan versus turret. == History == Turret lathes became indispensable to the production of interchangeable parts and for mass production. The first turret lathe was built by Stephen Fitch in 1845 to manufacture screws for pistol percussion parts. In the mid-nineteenth century, the need for interchangeable parts for Colt revolvers enhanced the role of turret lathes in achieving this goal as part of the "American system" of manufacturing arms. Clock-making and bicycle manufacturing had similar requirements. Christopher Spencer invented the first fully automated turret lathe in 1873, which led to designs using cam action or hydraulic mechanisms. From the late-19th through mid-20th centuries, turret lathes, both manual and automatic (i.e., screw machines and chuckers), were one of the most important classes of machine tools for mass production. They were used extensively in the mass production for the war effort in World War II. The U.S. company Warner & Swasey was one of the premier brands in heavy turret lathes between the 1910s and 1960s; it became the world's largest manufacturer of such lathes by 1928. During World War II, it employed 7,000 people and produced half of the turret lathes manufactured in the United States. == Types == There are many variants of the turret lathe. They can be most generally classified by size (small, medium, or large); method of control (manual, automated mechanically, or automated via computer (numerical control (NC) or computer numerical control (CNC)); and bed orientation (horizontal or vertical). === Archetypical: horizontal, manual === In the late 1830s a "capstan lathe" with a turret was patented in Britain. The first American turret lathe was invented by Stephen Fitch in 1845. The archetypical turret lathe, and the first in order of historical appearance, is the horizontal-bed, manual turret lathe. The term "turret lathe" without further qualification is still understood to refer to this type. The formative decades for this class of machine were the 1840s through 1860s, when the basic idea of mounting an indexable turret on a bench lathe or engine lathe was born, developed, and disseminated from the originating shops to many other factories. Some important tool-builders in this development were Stephen Fitch; Gay, Silver & Co.; Elisha K. Root of Colt; J.D. Alvord of the Sharps Armory; Frederick W. Howe, Richard S. Lawrence, and Henry D. Stone of Robbins & Lawrence; J.R. Brown of Brown & Sharpe; and Francis A. Pratt of Pratt & Whitney. Various designers at these and other firms later made further refinements. === Semi-automatic === Sometimes machines similar to those above, but with power feeds and automatic turret-indexing at the end of the return stroke, are called "semi-automatic turret lathes". This nomenclature distinction is blurry and not consistently observed. The term "turret lathe" encompasses them all. During the 1860s, when semi-automatic turret lathes were developed, they were sometimes called "automatic". What we today would call "automatics", that is, fully automatic machines, had not been developed yet. During that era both manual and semi-automatic turret lathes were sometimes called "screw machines", although we today reserve that term for fully automatic machines. === Automatic === During the 1870s through 1890s, the mechanically automated "automatic" turret lathe was developed and disseminated. These machines can execute many part-cutting cycles without human intervention. Thus the duties of the operator, which were already greatly reduced by the manual turret lathe, were even further reduced, and productivity increased. These machines use cams to automate the sliding and indexing of the turret and the opening and closing of the chuck. Thus, they execute the part-cutting cycle somewhat analogously to the way in which an elaborate cuckoo clock performs an automated theater show. Small- to medium-sized automatic turret lathes are usually called "screw machines" or "automatic screw machines", while larger ones are usually called "automatic chucking lathes", "automatic chuckers", or "chuckers". Such machine tools of the "automatic" variety, which in the pre-computer era meant mechanically automated, had already reached a highly advanced state by World War I. === Computer numerical control === When World War II ended, the digital computer was poised to develop from a colossal laboratory curiosity into a practical technology that could begin to disseminate into business and industry. The advent of computer-based automation in machine tools via numerical control (NC) and then computer numerical control (CNC) displaced to a large extent, but not at all completely, the previously existing manual and mechanically automated machines. Numerically controlled turrets allow automated selection of tools on a turret. CNC lathes may be horizontal or vertical in orientation and mount six separate tools on one or more turrets. Such machine tools can work in two axes per turret, with up to six axes being feasible for complex work. === Vertical === Vertical turret lathes have the workpiece held vertically, which allows the headstock to sit on the floor and the faceplate to become a horizontal rotating table, analogous to a huge potter's wheel. This is useful for the handling of very large, heavy, short workpieces. Vertical lathes in general are also called "vertical boring mills" or often simply "boring mills"; therefore a vertical turret lathe is a vertical boring mill equipped with a turret. == Other variations == === Capstan versus turret === The term "capstan lathe" overlaps in sense with the term "turret lathe" to a large extent. In many times and places, it has been understood to be synonymous with "turret lathe". In other times and places it has been held in technical contradistinction to "turret lathe", with the difference being in whether the turret's slide is fixed to the bed (ram-type turret) or slides on the bed's ways (saddle-type turret). The difference in terminology is mostly a matter of United Kingdom and Commonwealth usage versus United States usage. === Flat === A subtype of horizontal turret lathe is the flat-turret lathe. Its turret is flat (and analogous to a rotary table), allowing the turret to pass beneath the part. Patented by James Hartness of Jones & Lamson, and first disseminated in the 1890s, it was developed to provide more rigidity via requiring less overhang in the tool setup, especially when the part is relatively long. === Hollow-hexagon === Hollow-hexagon turret lathes competed with flat-turret lathes by taking the conventional hexagon turret and making it hollow, allowing the part to pass into it during the cut, analogously to how the part would pass over the flat turret. In both cases, the main idea is to increase rigidity by allowing a relatively long part to be turned without the tool overhang that would be needed with a conventional turret, which is not flat or hollow. === Monitor lathe === The term "monitor lathe" formerly (1860s–1940s) referred to the class of small- to medium-sized manual turret lathes used on relatively small work. The name was inspired by the monitor-class warships, which the monitor lathe's turret resembled. Today, lathes of such appearance, such as the Hardinge DSM-59 and its many clones, are still common, but the name "monitor lathe" is no longer current in the industry. === Toolpost turrets and tailstock turrets === Turrets can be added to non-turret lathes (bench lathes, engine lathes, toolroom lathes, etc.) by mounting them on the toolpost, tailstock, or both. Often these turrets are not as large as a turret lathe's, and they usually do not offer the sliding and stopping that a turret lathe's turret does; but they do offer the ability to index through successive tool

    Read more →
  • Reverse correlation technique

    Reverse correlation technique

    The reverse correlation technique is a data driven study method used primarily in psychological and neurophysiological research. This method earned its name from its origins in neurophysiology, where cross-correlations between white noise stimuli and sparsely occurring neuronal spikes could be computed quicker when only computing it for segments preceding the spikes. The term has since been adopted in psychological experiments that usually do not analyze the temporal dimension, but also present noise to human participants. In contrast to the original meaning, the term is here thought to reflect that the standard psychological practice of presenting stimuli of defined categories to the participants is "reversed": Instead, the participant's mental representations of categories are estimated from interactions of the presented noise and the behavioral responses. It is used to create composite pictures of individual and/or group mental representations of various items (e.g. faces, bodies, and the self) that depict characteristics of said items (e.g. trustworthiness and self-body image). This technique is helpful when evaluating the mental representations of those with and without mental illnesses. == Terms == This technique utilizes spike-triggered average to explain what areas of signal and noise in an image are valuable for the given research question. Signal is information used to produce objects of value that help explain and connect the world around us. Noise is commonly referred to as unwanted signal that obscures the information that the signal is trying to present. Most importantly for reverse correlation studies, noise is randomly varying information. To determine the areas of importance using reverse correlation, noise is applied to a base image and then evaluated by observers. A base image is any image void of noise that relates to the research question. A base image that has noise superimposed on top is the stimuli that is presented to and evaluated by participants. Each time a new set of stimuli is presented to a participant, this is known as a trial. After a participant has responded to hundreds to thousands of trials, a researcher is ready to create a classification image. A classification image (abbreviated as "CI" in some studies) is a single image that represents the average noise patterns in the images selected by participants. A classification image can also be computed for groups by averaging the individuals’ classification images. These classification images are what researchers use to interpret the data and draw conclusions. As a whole, the reverse correlation method is a process that results in a composite image (from an individual or group) that can be used to estimate and interpret mental representations. == Basic study layout == The reverse correlation method is typically executed as an in-lab computer experiment. This method follows four broad steps. Each of the following steps are described in greater detail below. After creating a research question and determining that the reverse correlation method is the most suitable technique to answer the question, a researcher must (1) design randomly varying stimuli. After the stimuli have been prepared, a researcher should (2) collect data from participants who will see and respond to approximately 300 -1,000 trials. Each trial will either consist of one or two images (side by side) derived from the same base image with noise superimposed on top. Participant responses will depend on the chosen study design; if a researcher presents only one image at a time, participants rate the image on a 4pt scale, but when two images are shown, the participant is asked to choose which best aligns with the given category (e.g. choose the image that looks the most aggressive). Once all of the data is collected, the researcher will (3) compute classification images for each participant and using those images compute group classification images. Finally, with the classification images available, the researcher will (4) evaluate the images and draw conclusions about their results. === Step 1: making stimuli === When designing the stimuli for a reverse correlation study, the two primary factors that one should consider are (1) the base image and (2) the noise that will be used. While not all bases are images per se, the majority are and for this reason the base is typically referred to as a base image. The base image should represent whatever the research question is addressing. For example, if you are interested in peoples’ mental representations of Chinese people, it would not make sense to use a base image of a Spanish or Caucasian person. Again, if you are interested in the mental representations of male vocal patterns, it would make the most sense to use a base vocal pattern that has been produced by a male. Having a base is important because it provides a kind of anchor for participants to work from. When there is no base image, the number of trials that are required increases dramatically, thus making it harder to collect data. While there are studies that have excluded a base image, (e.g. the S study), for more elaborate and nuanced research questions, it is important to have a base image that is a fair representation of what participants are being asked to categorize. Photographs of faces are generally the most popular base image. Although the reverse correlation method is capable of investigating a wide variety of research questions, the most common application of the method is for evaluating faces on a single trait. Reverse correlation studies that address evaluations of the face are sometimes referred to as being a face space reverse correlation model (FSRCM). Thankfully, there are existing databases for face images of varying demographics and emotion that work well as base images. The reverse correlation method can also be used to help researchers identify what areas of an image (e.g. the areas on the face) have diagnostic value. In order to identify these areas of value, researchers start by minimizing the space a participant can pull information from. By imposing a “mask” on an image (e.g. blur an image while leaving random areas un-blurred), this reduces the information individuals might see, and forces them to focus on certain areas. Then, if/when participants are able to correctly identify an image with a trait repeatedly, we can draw conclusions about what areas have diagnostic value. While faces and visual stimuli are the most popular, this is not the only stimuli that can be used in a reverse correlation study. This method was originally designed for auditory stimuli which allows researchers to investigate how perceivers interpret auditory information and create trait based attributions to different sound patterns. For example, by segmenting a vocal recording of a single word (total sound time 426 ms) into six segments (71 ms each), and varying each segment's pitch using Gaussian distributions, researchers were able to uncover what vocal patterns people associated with certain traits. Specifically, this study investigated how listeners rated sound clips of the word “really” as sounding more interrogative (i.e. like the more common reverse correlation studies this study had participants listen to two sound clips per trial, choose which fit the category the best, and then created an average of the pitch contours). Beyond face and auditory perception, research utilizing the reverse correlation method has expanded to investigate how individuals see three-dimensional objects in images with noise (but no signal). After selecting your base image, regardless of what the image is, it is helpful to apply a Gaussian blur to smooth noise in the image. While noise will be applied later, it is helpful to reduce existing noise in the photo before applying your chosen noise. There are three primary choices when it comes to noise: white noise, sine-wave noise, and Gabor noise. The latter two of these constrain the configurations that the noise can have, and because of this white noise is usually the most commonly used. Regardless of the type of noise that is chosen, it is crucial that the noise randomly varies. === Step 2: data collection === Once the stimuli for the study has been developed, the researcher must make a few decisions before actually collecting the data. The researcher must come to a conclusion on how many stimuli will be presented at a time and how many trials the participants will see. In terms of stimuli presentation, a researcher can choose from either a 2-Image Forced Choice (2IFC) or a 4-Alternative Forced Choice (4AFC). The 2IFC presents two images at once (side by side) and requires participants to choose between the two on a specified category (e.g. which image looks the most like a male). Typically the noise from the left image is the mathematical inverse of the noise from the right image. This method was developed to better answer questions that could n

    Read more →
  • Seq2seq

    Seq2seq

    Seq2seq is a family of machine learning approaches used for natural language processing. Originally developed by Lê Viết Quốc, a Vietnamese computer scientist and a machine learning pioneer at Google Brain, this framework has become foundational in many modern AI systems. Applications include language translation, image captioning, conversational models, speech recognition, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence. == History == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. seq2seq is an approach to machine translation (or more generally, sequence transduction) with roots in information theory, where communication is understood as an encode-transmit-decode process, and machine translation can be studied as a special case of communication. This viewpoint was elaborated, for example, in the noisy channel model of machine translation. In practice, seq2seq maps an input sequence into a real-numerical vector by using a neural network (the encoder), and then maps it back to an output sequence using another neural network (the decoder). The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014. In the seq2seq as proposed by them, both the encoder and the decoder were LSTMs. This had the "bottleneck" problem, since the encoding vector has a fixed size, so for long input sequences, information would tend to be lost, as they are difficult to fit into the fixed-length encoding vector. The attention mechanism, proposed in 2014, resolved the bottleneck problem. They called their model RNNsearch, as it "emulates searching through a source sentence during decoding a translation". A problem with seq2seq models at this point was that recurrent neural networks are difficult to parallelize. The 2017 publication of Transformers resolved the problem by replacing the encoding RNN with self-attention Transformer blocks ("encoder blocks"), and the decoding RNN with cross-attention causally-masked Transformer blocks ("decoder blocks"). === Priority dispute === One of the papers cited as the originator for seq2seq is (Sutskever et al 2014), published at Google Brain while they were on Google's machine translation project. The research allowed Google to overhaul Google Translate into Google Neural Machine Translation in 2016. Tomáš Mikolov claims to have developed the idea (before joining Google Brain) of using a "neural language model on pairs of sentences... and then [generating] translation after seeing the first sentence"—which he equates with seq2seq machine translation, and to have mentioned the idea to Ilya Sutskever and Quoc Le (while at Google Brain), who failed to acknowledge him in their paper. Mikolov had worked on RNNLM (using RNN for language modelling) for his PhD thesis, and is more notable for developing word2vec. == Architecture == The main reference for this section is. === Encoder === The encoder is responsible for processing the input sequence and capturing its essential information, which is stored as the hidden state of the network and, in a model with attention mechanism, a context vector. The context vector is the weighted sum of the input hidden states and is generated for every time instance in the output sequences. === Decoder === The decoder takes the context vector and hidden states from the encoder and generates the final output sequence. The decoder operates in an autoregressive manner, producing one element of the output sequence at a time. At each step, it considers the previously generated elements, the context vector, and the input sequence information to make predictions for the next element in the output sequence. Specifically, in a model with attention mechanism, the context vector and the hidden state are concatenated together to form an attention hidden vector, which is used as an input for the decoder. The seq2seq method developed in the early 2010s uses two neural networks: an encoder network converts an input sentence into numerical vectors, and a decoder network converts those vectors to sentences in the target language. The Attention mechanism was grafted onto this structure in 2014 and is shown below. Later it was refined into the encoder-decoder Transformer architecture of 2017. === Training vs prediction === There is a subtle difference between training and prediction. During training time, both the input and the output sequences are known. During prediction time, only the input sequence is known, and the output sequence must be decoded by the network itself. Specifically, consider an input sequence x 1 : n {\displaystyle x_{1:n}} and output sequence y 1 : m {\displaystyle y_{1:m}} . The encoder would process the input x 1 : n {\displaystyle x_{1:n}} step by step. After that, the decoder would take the output from the encoder, as well as the as input, and produce a prediction y ^ 1 {\displaystyle {\hat {y}}_{1}} . Now, the question is: what should be input to the decoder in the next step? A standard method for training is "teacher forcing". In teacher forcing, no matter what is output by the decoder, the next input to the decoder is always the reference. That is, even if y ^ 1 ≠ y 1 {\displaystyle {\hat {y}}_{1}\neq y_{1}} , the next input to the decoder is still y 1 {\displaystyle y_{1}} , and so on. During prediction time, the "teacher" y 1 : m {\displaystyle y_{1:m}} would be unavailable. Therefore, the input to the decoder must be y ^ 1 {\displaystyle {\hat {y}}_{1}} , then y ^ 2 {\displaystyle {\hat {y}}_{2}} , and so on. It is found that if a model is trained purely by teacher forcing, its performance would degrade during prediction time, since generation based on the model's own output is different from generation based on the teacher's output. This is called exposure bias or a train/test distribution shift. A 2015 paper recommends that, during training, randomly switch between teacher forcing and no teacher forcing. === Attention for seq2seq === The attention mechanism is an enhancement introduced by Bahdanau et al. in 2014 to address limitations in the basic Seq2Seq architecture where a longer input sequence results in the hidden state output of the encoder becoming irrelevant for the decoder. It enables the model to selectively focus on different parts of the input sequence during the decoding process. At each decoder step, an alignment model calculates the attention score using the current decoder state and all of the attention hidden vectors as input. An alignment model is another neural network model that is trained jointly with the seq2seq model used to calculate how well an input, represented by the hidden state, matches with the previous output, represented by attention hidden state. A softmax function is then applied to the attention score to get the attention weight. In some models, the encoder states are directly fed into an activation function, removing the need for alignment model. An activation function receives one decoder state and one encoder state and returns a scalar value of their relevance. Consider the seq2seq language English-to-French translation task. To be concrete, let us consider the translation of "the zone of international control ", which should translate to "la zone de contrôle international ". Here, we use the special token as a control character to delimit the end of input for both the encoder and the decoder. An input sequence of text x 0 , x 1 , … {\displaystyle x_{0},x_{1},\dots } is processed by a neural network (which can be an LSTM, a Transformer encoder, or some other network) into a sequence of real-valued vectors h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , where h {\displaystyle h} stands for "hidden vector". After the encoder has finished processing, the decoder starts operating over the hidden vectors, to produce an output sequence y 0 , y 1 , … {\displaystyle y_{0},y_{1},\dots } , autoregressively. That is, it always takes as input both the hidden vectors produced by the encoder, and what the decoder itself has produced before, to produce the next output word: ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , "") → "la" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la") → "la zone" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone") → "la zone de" ... ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone de contrôle international") → "la zone de contrôle international " Here, we use the special token as a control character to delimit the start of input for the decoder. The decoding terminates as soon as "" appears in the decoder output. ==

    Read more →
  • DeepRoute.ai

    DeepRoute.ai

    DeepRoute.ai (Chinese: 元戎启行) is a Chinese autonomous driving company founded in 2019 and headquartered in Shenzhen, China. The company develops full-stack self-driving solutions including perception, decision-making, and control systems. == History == DeepRoute.ai was founded in February 2019 in Shenzhen, China, by Zhou Guang (周光), who serves as the company's CEO. In September 2019, the company collaborated with Dongfeng for a live-streamed autonomous driving demonstration. In October 2019, during the 7th Military World Games, DeepRoute.ai conducted Robotaxi demonstration operations. In November 2019, it obtained an intelligent connected vehicle road test permit for public roads in Shenzhen. In October 2020, DeepRoute.ai signed an "Autonomous Driving Leadership Project" with Dongfeng to build one of China's largest autonomous fleets. In August 2020, DeepRoute.ai announced its partnership with Cao Cao Mobility, a Geely-backed ride-hailing company, to test Robotaxis in Hangzhou for daily operations, planning to provide Robotaxis during the 2022 Asian Games. In September 2021, DeepRoute.ai secured US$300 million in a Series B funding round led by Alibaba. In December 2021, the company unveiled its DeepRoute-Driver 2.0, an L4-level autonomous driving solution comprising five solid-state lidar sensors, eight cameras, a proprietary computing system and an optional millimeter-wave radar. with a production cost of under US$10,000. In June 2022, it partnered with Deppon Express to provide autonomous light truck freight transfer services. In March 2023, the company launched its high-precision map-free intelligent driving solution, DeepRoute-Driver 3.0. In November 2024, Great Wall Motor announced a $100 million Series C funding round for Deeproute. With this, Deeproute has completed five rounds of financing, raising a cumulative total of over $500 million. Its shareholders include Fosun RZ Capital, Yunqi Partners, Alibaba, Vision Plus Capital, and Dongfeng, among others. In the same month, Deeproute.ai emphasised that they were in "deep cooperation" with Nvidia and spoke on being part of the first batch of companies in China to get a hold of Nvidia's newer Thor chip for cars which will be used in a new system released next year. This new system will help manage more complex driving scenarios through visual cues. == Products == === VLA Model === VLA Model is a Vision–language–action model designed for autonomous driving systems. It integrates visual perception, semantic understanding, and action decision-making into a unified framework, aiming to enhance the safety and adaptability of advanced driver-assistance systems (ADAS) in complex road environments. The model was officially launched on August 26, 2025, as the core of DeepRoute.ai's DeepRoute IO 2.0 platform. The VLA model is characterized by its "visual-language-action" architecture, which incorporates a chain-of-thought (CoT) reasoning capability inspired by large language models. This design is intended to address the "black box" limitations of traditional end-to-end autonomous driving systems by enabling the model to analyze information, infer causality, and make decisions in a more transparent and interpretable manner. === Appliance === The company has partnered with several automakers including Dongfeng Motor Corporation and Geely to develop and test autonomous vehicles.

    Read more →
  • Automatic meter reading

    Automatic meter reading

    Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from water meter or energy metering devices (gas, electric) and transferring that data to a central database for billing, troubleshooting, and analyzing. This technology mainly saves utility providers the expense of periodic trips to each physical location to read a meter. Another advantage is that billing can be based on near real-time consumption rather than on estimates based on past or predicted consumption. This timely information coupled with analysis can help both utility providers and customers better control the use and production of electric energy, gas usage, or water consumption. AMR technologies include handheld, mobile and network technologies based on telephony platforms (wired and wireless), radio frequency (RF), or powerline transmission. == Technologies == === Touch technology === With touch-based AMR, a meter reader carries a handheld computer or data collection device with a wand or probe. The device automatically collects the readings from a meter by touching or placing the read probe close to a reading coil enclosed in the touchpad. When a button is pressed, the probe sends an interrogate signal to the touch module to collect the meter reading. The software in the device matches the serial number to one in the route database, and saves the meter reading for later download to a billing or data collection computer. Since the meter reader still has to go to the site of the meter, this is sometimes referred to as "on-site" AMR. Another form of contact reader uses a standardized infrared port to transmit data. Protocols are standardized between manufacturers by such documents as ANSI C12.18 or IEC 61107. === AMR hosting === AMR hosting is a back-office solution which allows a user to track their electricity, water, or gas consumption over the Internet. All data is collected in near real-time, and is stored in a database by data acquisition software. The user can view the data via a web application, and can analyze the data using various online analysis tools such as charting load profiles, analyzing tariff components, and verify their utility bill. === Radio frequency network === Radio frequency based AMR can take many forms. The more common ones are handheld, mobile, satellite and fixed network solutions. There are both two-way RF systems and one-way RF systems in use that use both licensed and unlicensed RF bands. In a two-way or "wake up" system, a radio signal is normally sent to an AMR meter's unique serial number, instructing its transceiver to power-up and transmit its data. The meter transceiver and the reading transceiver both send and receive radio signals. In a one-way "bubble-up" or continuous broadcast type system, the meter transmits continuously and data is sent every few seconds. This means the reading device can be a receiver only, and the meter a transmitter only. Data travels only from the meter transmitter to the reading receiver. There are also hybrid systems that combine one-way and two-way techniques, using one-way communication for reading and two-way communication for programming functions. RF-based meter reading usually eliminates the need for the meter reader to enter the property or home, or to locate and open an underground meter pit. The utility saves money by increased speed of reading, has less liability from entering private property, and has fewer missed readings from being unable to access the meter. The technology based on RF is not readily accepted everywhere. In several Asian countries, the technology faces a barrier of regulations in place pertaining to use of the radio frequency of any radiated power. For example, in India the radio frequency which is generally in ISM band is not free to use even for low power radio of 10 mW. The majority of manufacturers of electricity meters have radio frequency devices in the frequency band of 433/868 MHz for large scale deployment in European countries. The frequency band of 2.4 GHz can be now used in India for outdoor as well as indoor applications, but few manufacturers have shown products within this frequency band. Initiatives in radio frequency AMR in such countries are being taken up with regulators wherever the cost of licensing outweighs the benefits of AMR. ==== Handheld ==== In handheld AMR, a meter reader carries a handheld computer with a built-in or attached receiver/transceiver (radio frequency or touch) to collect meter readings from an AMR capable meter. This is sometimes referred to as "walk-by" meter reading since the meter reader walks by the locations where meters are installed as they go through their meter reading route. Handheld computers may also be used to manually enter readings without the use of AMR technology as an alternate but this will not support exhaustive data which can be accurately read using the meter reading electronically. ==== Mobile ==== Mobile or "drive-by" meter reading is where a reading device is installed in a vehicle. The meter reader drives the vehicle while the reading device automatically collects the meter readings. Often, for mobile meter reading, the reading equipment includes navigational and mapping features provided by GPS and mapping software. With mobile meter reading, the reader does not normally have to read the meters in any particular route order, but just drives the service area until all meters are read. Components often consist of a laptop or proprietary computer, software, RF receiver/transceiver, and external vehicle antennas. ==== Satellite ==== Transmitters for data collection satellites can be installed in the field next to existing meters. The satellite AMR devices communicate with the meter for readings, and then sends those readings over a fixed or mobile satellite network. This network requires a clear view to the sky for the satellite transmitter/receiver, but eliminates the need to install fixed towers or send out field technicians, thereby being particularly suited for areas with low geographic meter density. ==== RF technologies commonly used for AMR ==== Narrow Band (single fixed radio frequency) Spread spectrum Direct-sequence spread spectrum (DSSS) Frequency-hopping spread spectrum (FHSS) There are also meters using AMR with RF technologies such as cellular phone data systems, Zigbee, Bluetooth, Wavenis and others. Some systems operate with U.S. Federal Communications Commission (FCC) licensed frequencies and others under FCC Part 15, which allows use of unlicensed radio frequencies. ==== Wi-Fi ==== WiSmart is a versatile platform which can be used by a variety of electrical home appliances in order to provide wireless TCP/IP communication using the 802.11 b/g protocol. Devices such as the Smart Thermostat permit a utility to lower a home's power consumption to help manage power demand. The city of Corpus Christi became one of the first cities in the United States to implement citywide Wi-Fi, which had been free until May 31, 2007, mainly to facilitate AMR after a meter reader was attacked by a dog. Today many meters are designed to transmit using Wi-Fi, even if a Wi-Fi network is not available, and they are read using a drive-by local Wi-Fi hand held receiver. The meters installed in Corpus Christi are not directly Wi-Fi enabled, but rather transmit narrow-band burst telemetry on the 460 MHz band. This narrow-band signal has much greater range than Wi-Fi, so the number of receivers required for the project are far fewer. Special receiver stations then decode the narrow-band signals and resend the data via Wi-Fi. Most of the automated utility meters installed in the Corpus Christi area are battery powered. Wi-Fi technology is unsuitable for long-term battery-powered operation. === Power line communication === PLC is a method where electronic data is transmitted over power lines back to the substation, then relayed to a central computer in the utility's main office. This would be considered a type of fixed network system—the network being the distribution network which the utility has built and maintains to deliver electric power. Such systems are primarily used for electric meter reading. Some providers have interfaced gas and water meters to feed into a PLC type system. == Brief history == In 1972, Theodore George "Ted" Paraskevakos, while working with Boeing in Huntsville, Alabama, developed a sensor monitoring system which used digital transmission for security, fire and medical alarm systems as well as meter reading capabilities for all utilities. This technology was a spin-off of the automatic telephone line identification system, now known as caller ID. In 1974, Paraskevakos was awarded a U.S. patent for this technology. In 1977, he launched Metretek, Inc., which developed and produced the first fully automated, commercially available remote meter reading and load management system. Since this system was developed pre-Internet, Metret

    Read more →
  • The Triple Revolution

    The Triple Revolution

    "The Triple Revolution" was an open memorandum sent to U.S. President Lyndon B. Johnson and other government figures on March 22, 1964. It concerned three megatrends of the time: increasing use of automation, the nuclear arms race, and advancements in human rights. Drafted under the auspices of the Center for the Study of Democratic Institutions, it was signed by an array of noted social activists, professors, and technologists who identified themselves as the Ad Hoc Committee on the Triple Revolution. The chief initiator of the proposal was W. H. "Ping" Ferry, at that time a vice-president of CSDI, basing it in large part on the ideas of the futurist Robert Theobald. == Overview == The statement identified three revolutions underway in the world: the cybernation revolution of increasing automation; the weaponry revolution of mutually assured destruction; and the human rights revolution. It discussed primarily the cybernation revolution. The committee claimed that machines would usher in "a system of almost unlimited productive capacity" while continually reducing the number of manual laborers needed, and increasing the skill needed to work, thereby producing increasing levels of unemployment. It proposed that the government should ease this transformation through large-scale public works, low-cost housing, public transit, electrical power development, income redistribution, union representation for the unemployed, and government restraint on technology deployment. == Legacy == Martin Luther King Jr.'s final Sunday sermon, delivered six days before his April 1968 assassination, explicitly references the thesis of "The Triple Revolution": There can be no gainsaying of the fact that a great revolution is taking place in the world today. In a sense it is a triple revolution: that is, a technological revolution, with the impact of automation and cybernation; then there is a revolution in weaponry, with the emergence of atomic and nuclear weapons of warfare; then there is a human rights revolution, with the freedom explosion that is taking place all over the world. Yes, we do live in a period where changes are taking place. And there is still the voice crying through the vista of time saying, "Behold, I make all things new; former things are passed away." In Harlan Ellison's 1967 anthology Dangerous Visions, Philip José Farmer's story "Riders of the Purple Wage" uses the Triple Revolution document as the premise of a future society, in which the "purple wage" of the title is a guaranteed income dole on which most of the population lives. At the 1968 World Science Fiction Convention in San Francisco, Farmer delivered a lengthy Guest of Honor speech in which he called for the founding of a grassroots activist organization called REAP which would work for implementation of the Ad Hoc Committee's recommendations. Looking back on the proposal in his 2008 book, Daniel Bell wrote: "the cybernetic revolution quickly proved to be illusory. There were no spectacular jumps in productivity. ... Cybernation had proved to be one more instance of the penchant for overdramatizing a momentary innovation and blowing it up far out of proportion to its actuality. ... The image of a completely automated production economy—with an endless capacity to turn out goods—was simply a social-science fiction of the early 1960s. Paradoxically, the vision of Utopia was suddenly replaced by the spectre of Doomsday. In place of the early-sixties theme of endless plenty, the picture by the end of the decade was one of a fragile planet of limited resources whose finite stocks were being rapidly depleted, and whose wastes from soaring industrial production were polluting the air and waters." In his 2015 book Rise of the Robots, Martin Ford claims The Triple Revolution's predictions of steady decline in future employment were not wrong, but rather premature. He cites "Seven Deadly Trends" that began in the 1970s-1980s and by the mid-2010s appeared set to continue: Stagnation in real wages Decline in labor's share of national income in many countries (breakdown of Bowley's law), while corporate profits increased Declining labor force participation Diminishing job creation, lengthening jobless recoveries, and soaring long-term unemployment Rising inequality Declining incomes, and underemployment for recent college graduates Polarization and part-time jobs (middle-class jobs are disappearing, to be replaced by a small number of high-paying jobs and large number of low-paying jobs) According to Ford, the 1960s were part of what in retrospect seems like a golden age for labor in the United States, when productivity and wages rose together in near lockstep, and unemployment was low. But after about 1980, wages began stagnating while productivity continued to rise. Labor's share of the economic output began to decline. Ford describes the role that automation and information technology play in these trends, and how new technologies including narrow AI threaten to destroy jobs faster than displaced workers can be retrained for new jobs, before automation takes the new jobs as well. This includes many job categories, such as in transportation, that were never threatened by automation before. According to a 2013 study, about 47% of US jobs are susceptible to automation. == Signatories ==

    Read more →
  • Tango (platform)

    Tango (platform)

    Tango (named Project Tango while in testing) was an augmented reality computing platform, developed and authored by the Advanced Technology and Projects (ATAP), a skunkworks division of Google. It used computer vision to enable mobile devices, such as smartphones and tablets, to detect their position relative to the world around them without using GPS or other external signals. This allowed application developers to create user experiences that include indoor navigation, 3D mapping, physical space measurement, environmental recognition, augmented reality, and windows into a virtual world. The first product to emerge from ATAP, Tango was developed by a team led by computer scientist Johnny Lee, a core contributor to Microsoft's Kinect. In an interview in June 2015, Lee said, "We're developing the hardware and software technologies to help everything and everyone understand precisely where they are, anywhere." Google produced two devices to demonstrate the Tango technology: the Peanut phone and the Yellowstone 7-inch tablet. More than 3,000 of these devices had been sold as of June 2015, chiefly to researchers and software developers interested in building applications for the platform. In the summer of 2015, Qualcomm and Intel both announced that they were developing Tango reference devices as models for device manufacturers who use their mobile chipsets. At CES, in January 2016, Google announced a partnership with Lenovo to release a consumer smartphone during the summer of 2016 to feature Tango technology marketed at consumers, noting a less than $500 price-point and a small form factor below 6.5 inches. At the same time, both companies also announced an application incubator to get applications developed to be on the device on launch. On 15 December 2017, Google announced that they would be ending support for Tango on March 1, 2018, in favor of ARCore. == Overview == Tango was different from other contemporary 3D-sensing computer vision products, in that it was designed to run on a standalone mobile phone or tablet and was chiefly concerned with determining the device's position and orientation within the environment. The software worked by integrating three types of functionality: Motion-tracking: using visual features of the environment, in combination with accelerometer and gyroscope data, to closely track the device's movements in space Area learning: storing environment data in a map that can be re-used later, shared with other Tango devices, and enhanced with metadata such as notes, instructions, or points of interest Depth perception: detecting distances, sizes, and surfaces in the environment Together, these generate data about the device in "six degrees of freedom" (3 axes of orientation plus 3 axes of position) and detailed three-dimensional information about the environment. Project Tango was also the first project to graduate from Google X in 2012 Applications on mobile devices use Tango's C and Java APIs to access this data in real time. In addition, an API was also provided for integrating Tango with the Unity game engine; this enabled the conversion or creation of games that allow the user to interact and navigate in the game space by moving and rotating a Tango device in real space. These APIs were documented on the Google developer website. == Applications == Tango enabled apps to track a device's position and orientation within a detailed 3D environment, and to recognize known environments. This allowed the creations of applications such as in-store navigation, visual measurement and mapping utilities, presentation and design tools, and a variety of immersive games. At Augmented World Expo 2015, Johnny Lee demonstrated a construction game that builds a virtual structure in real space, an AR showroom app that allows users to view a full-size virtual automobile and customize its features, a hybrid Nerf gun with mounted Tango screen for dodging and shooting AR monsters superimposed on reality, and a multiplayer VR app that lets multiple players converse in a virtual space where their avatar movements match their real-life movements. Tango apps are distributed through Play. Google has encouraged the development of more apps with hackathons, an app contest, and promotional discounts on the development tablet. == Devices == As a platform for software developers and a model for device manufacturers, Google created two Tango devices. === The Peanut phone === "Peanut" was the first production Tango device, released in the first quarter of 2014. It was a small Android phone with a Qualcomm MSM8974 quad-core processor and additional special hardware including a fisheye motion camera, "RGB-IR" camera for color image and infrared depth detection, and Movidius Vision processing units. A high-performance accelerometer and gyroscope were added after testing several competing models in the MARS lab at the University of Minnesota. Several hundred Peanut devices were distributed to early-access partners including university researchers in computer vision and robotics, as well as application developers and technology startups. Google stopped supporting the Peanut device in September 2015, as by then the Tango software stack had evolved beyond the versions of Android that run on the device. === The Yellowstone tablet === "Yellowstone" was a 7-inch tablet with full Tango functionality, released in June 2014, and sold as the Project Tango Tablet Development Kit. It featured a 2.3 GHz quad-core Nvidia Tegra K1 processor, 128GB flash memory, 1920x1200-pixel touchscreen, 4MP color camera, fisheye-lens (motion-tracking) camera, an IR projector with RGB-IR camera for integrated depth sensing, and 4G LTE connectivity. As of May 27, 2017, the Tango tablet is considered officially unsupported by Google. ==== Testing by NASA ==== In May 2014, two Peanut phones were delivered to the International Space Station to be part of a NASA project to develop autonomous robots that navigate in a variety of environments, including outer space. The soccer-ball-sized, 18-sided polyhedral SPHERES robots were developed at the NASA Ames Research Center, adjacent to the Google campus in Mountain View, California. Andres Martinez, SPHERES manager at NASA, said "We are researching how effective [Tango's] vision-based navigation abilities are for performing localization and navigation of a mobile free flyer on ISS. === Intel RealSense smartphone === Announced at Intel's Developer Forum in August 2015, and offered to public through a Developer Kit since January 2016. It incorporated a RealSense ZR300 camera which had optical features required for Tango, such as the fisheye camera. === Lenovo Phab 2 Pro === Lenovo Phab 2 Pro was the first commercial smartphone with the Tango Technology, the device was announced at the beginning of 2016, launched in August, and available for purchase in the US in November. The Phab 2 Pro had a 6.4 inch screen, a Snapdragon 652 processor, and 64 GB of internal storage, with a rear facing 16 Megapixels camera and 8 MP front camera. === Asus Zenfone AR === Asus Zenfone AR, announced at CES 2017, was the second commercial smartphone with the Tango Technology. It ran Tango AR & Daydream VR on Snapdragon 821, with 6GB or 8GB of RAM and 128 or 256GB of internal memory depending on the configuration.

    Read more →
  • Auto-defrost

    Auto-defrost

    Auto-defrost, automatic defrost or self-defrosting is a technique which regularly defrosts the evaporator in a refrigerator or freezer. Appliances using this technique are often called frost free, frostless, or no-frost. == Mechanism == The defrost mechanism in a refrigerator heats the cooling element (evaporator coil) for a short period of time and melts the frost that has formed on it. The resulting water drains through a duct at the back of the unit. Defrosting is controlled by an electric or electronic timer. For every 6, 8, 10, 12 or 24 hours of compressor operation, it turns on a defrost heater for 15 minutes to half an hour. The defrost heater, having a typical power rating of 350W to 600W, is often mounted just below the evaporator in top and bottom-freezer models. It can also be located below and in the middle of the evaporator in side-by-side models. It may be protected from short circuits by means of fusible links. In older refrigerators, the timer runs continuously. In newer designs, the timer only runs while the compressor runs, so the longer the refrigerator door is closed, the less time the heater will run for and the more energy is saved. A defrost thermostat opens the heater circuit when the evaporator temperature rises above a preset temperature, 40°F (5°C) or more, thereby preventing excessive heating of the freezer compartment. The defrost timer is such that either the compressor or the defrost heater is on, but not both at the same time. Inside the freezer, air is circulated by means of one or more fans. In a typical design cold air from the freezer compartment is ducted to the fresh food compartment and circulated back into the freezer compartment. Air circulation helps sublimate any ice or frost that may form on frozen items in the freezer compartment. While defrosting, this fan is stopped to prevent heated-up air from reaching the food compartment. Instead of the normal cooling elements being embedded in the freezer liner, auto-defrost elements are behind or beneath the liner. This allows them to be heated for short periods of time to dispose of frost, without heating the contents of the freezer. Alternatively, some systems use the hot gas in the condenser to defrost the evaporator. This is done by means of a circuit that is cross-linked by a three-way valve. The hot gas quickly heats up the evaporator and defrosts it. This system is primarily used in commercial applications such as ice-cream displays. == Application == While this technique was originally applied to the refrigerator compartment, it was later used for freezer compartment as well. A combined refrigerator/freezer which applies self-defrosting to the refrigerator compartment only is usually called "partial frost free" or semi-automatic defrost (some brands call these "Auto Defrost" while Frigidaire referred to their semi-automatic models as "Cycla-Matic," Kelvinator often named these models as "Cyclic Defrost" ). These refrigerators usually have a pan underneath where water from the melted frost in the refrigerator section evaporates. Freezers with automatic defrosting and combined refrigerator/freezer units which also apply self defrosting to their freezer compartment are called "frost free". The latter usually feature an air connection between the two compartments with the air passage to the refrigerator compartment regulated by a damper. By this means, a controlled portion of the air coming from the freezer reaches the refrigerator. Some older models have no air circulation between their freezer and refrigerator sections. Instead, they use an independent cooling system (for example: an evaporator coil with a defrost heater and a circulating fan in the freezer and a cold-plate or open-coil evaporator in the refrigerator. "Frost-Free" refrigerator/freezer units usually use a heating element to defrost their evaporators, a pan to collect and evaporate water from the frost that melts from the cold plate and/or evaporator coil, a timer which turns off the compressor and turns on the defrost element usually from once to 4 times a day for periods usually ranging from 15 to 30 minutes, a defrost limiter thermostat that turns off the heating element before the temperature rises too much while the timer is still in its defrost phase. Some models also feature a drain heater to prevent ice from blocking the drain. Other early types of refrigerators also use hot gas defrost instead of electric heaters. These reverse the evaporator and condenser sides for the defrost cycle. Some newer refrigerator/freezer models have a computer that monitors how many times each door is opened and uses this data to control defrost scheduling thereby reducing power use. == Advantages == No need to manually defrost the frost buildup, therefore power consumption will not increase with time. Food packaging is easier to see. Most frozen food will not stick together. Smells are limited, especially in total frost-free appliances because the air always circulates. Better temperature management. == Disadvantages == The system can be more expensive to run when usage is high and if the fan continues or starts to run when the door is opened. A thermal cutout safety device is required to prevent overheating of the heating element. Increased electrical and mechanical complexity compared to a basic upright freezer or chest freezer, making it more prone to component failure. The temperature of the freezer contents rises during the defrosting cycles, especially if there is a light load in the freezer. This can cause "freezer burn" on articles placed in the freezer, from partially defrosting, then re-freezing On hot, humid days condensation will sometimes form around the refrigerator doors. Defrosting may not be completed by the time the defrost timer cycles back to normal operation (especially in hot, humid conditions with frequent door openings), leaving ice/frost on the evaporator coils. This condition can lead to "icing" which will interfere with the operation of the refrigerator. In laboratories, self-defrosting freezers must not be used to store certain delicate reagents such as enzymes, because the temperature cycling can degrade them. In addition, water can evaporate out of containers that do not have a very tight seal, altering the concentration of the reagents. Self-defrosting freezers should never be used to store flammable chemicals.

    Read more →
  • Image warping

    Image warping

    Image warping is the process of digitally manipulating an image such that any shapes portrayed in the image have been significantly distorted. Warping may be used for correcting image distortion as well as for creative purposes (e.g., morphing). The same techniques are equally applicable to video. While an image can be transformed in various ways, pure warping means that points are mapped to points without changing the colors. This can be based mathematically on any function from (part of) the plane to the plane. If the function is injective the original can be reconstructed. If the function is a bijection any image can be inversely transformed. Some methods are: Images may be distorted through simulation of optical aberrations. Images may be viewed as if they had been projected onto a curved or mirrored surface. (This is often seen in ray traced images.) Images can be partitioned into image polygons and each polygon distorted. Images can be distorted using morphing. The most obvious approach to transforming a digital image is the forward mapping. This applies the transform directly to the source image, typically generating unevenly-spaced points that will then be interpolated to generate the required regularly-spaced pixels. However, for injective transforms reverse mapping is also available. This applies the inverse transform to the target pixels to find the unevenly-spaced locations in the source image that contribute to them. Estimating them from source image pixels will require interpolation of the source image. To work out what kind of warping has taken place between consecutive images, one can use optical flow estimation techniques. == Image warping toolbox == ImWIP is an open-source, image warping tool for modeling deformation and motion in digital images, which contains differentiable image warping operators, together with their exact adjoints and derivatives.

    Read more →