AI Art Video

AI Art Video — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Brownout (software engineering)

    Brownout (software engineering)

    Brownout in software engineering is a technique that involves disabling certain features of an application. == Description == Brownout is used to increase the robustness of an application to computing capacity shortage. If too many users are simultaneously accessing an application hosted online, the underlying computing infrastructure may become overloaded, rendering the application unresponsive. Users are likely to abandon the application and switch to competing alternatives, hence incurring long-term revenue loss. To better deal with such a situation, the application can be given brownout capabilities: The application will disable certain features – e.g., an online shop will no longer display recommendations of related products – to avoid overload. Although reducing features generally has a negative impact on the short-term revenue of the application owner, long-term revenue loss can be avoided. The technique is inspired by brownouts in power grids, which consists in reducing the power grid's voltage in case electricity demand exceeds production. Some consumers, such as incandescent light bulbs, will dim – hence originating the term – and draw less power, thus helping match demand with production. Similarly, a brownout application helps match its computing capacity requirements to what is available on the target infrastructure. Brownout complements elasticity. The former can help the application withstand short-term capacity shortage, but does so without changing the capacity available to the application. In contrast, elasticity consists of adding (or removing) capacity to the application, preferably in advance, so as to avoid capacity shortage altogether. The two techniques can be combined; e.g., brownout is triggered when the number of users increases unexpectedly until elasticity can be triggered, the latter usually requiring minutes to show an effect. Brownout is relatively non-intrusive for the developer, for example, it can be implemented as an advice in aspect-oriented programming. However, surrounding components, such as load-balancers, need to be made brownout-aware to distinguish between cases where an application is running normally and cases where the application maintains a low response time by triggering brownout. == Usage in phased deprecation == A related use of the brownout concept in software engineering is the deliberate introduction of temporary outages to a system, API or feature that is being phased out. This is sometimes also called a "scream test" when it is used to discover unknown dependents of a system or API. The intention is to allow detection of downstream consumers of an API or service who may otherwise have missed deprecation announcements or to uncover hidden side-effects of the deprecation that may have been overlooked. The intention is that developers of dependent systems will notice their own system failures caused by the upstream brownout. Such brownouts are typically pre-announced scheduled outages or probabilistic in nature (such as artificially failing a percentage of requests). As a brownout is only a temporary or partial outage, it provides downstream consumers of an API or service time to remove any discovered dependencies on the deprecated API before it is fully retired. For consumers that have already prepared for the deprecation, a brownout provides valuable testing that the final removal of the service won't cause any unexpected problems.

    Read more →
  • Perceptual robotics

    Perceptual robotics

    Perceptual robotics is an interdisciplinary science linking Robotics and Neuroscience. It investigates biologically motivated robot control strategies, concentrating on perceptual rather than cognitive processes and thereby sides with J. J. Gibson's view against the Poverty of the stimulus theory. As a working definition, the following quote from Chapter 64 by H. Bülthoff, C. Wallraven and M. Giese from The Springer Handbook of Robotics, edited by Bruno Siciliano and Oussama Khatib, published by Springer in 2007, could be used: In the following we will apply the term Perceptual Robotics to signify the design of robots based on principles that are derived from human perception on all three levels in the sense of Marr. This includes a realization in terms of specific neural circuits as well as the transfer of more abstract biologically-inspired strategies for the solution of relevant computational problems.

    Read more →
  • Nuance Communications

    Nuance Communications

    Nuance Communications, Inc. was an American multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that markets speech recognition and artificial intelligence software. Nuance merged with its competitor in the commercial large-scale speech application business, ScanSoft, in October 2005. ScanSoft was a Xerox spin-off that was bought in 1999 by Visioneer, a hardware and software scanner company, which adopted ScanSoft as the new merged company name. The original ScanSoft had its roots in Kurzweil Computer Products. In April 2021, Microsoft announced it would buy Nuance Communications. The deal is an all-cash transaction of $19.7 billion, including company debt, or $56 per share. The acquisition was completed in March 2022. == History == The Speech Technology and Research (STAR) Laboratory at SRI International began the journey that, in 1994, resulted in a spin-off company; Corona Corporation (later renamed to Nuance Communications ). Nuance Communications (NUAN) went public on the Nasdaq Stock Market in 1995. Nuance focused on commercializing advanced speech recognition technologies. Nuance was an early spinoff of SRI's Speech Technology and Research (STAR) Laboratory, a world leader in audio processing, speech and speaker analytics and spoken language research. The technology that served as the foundation of Nuance's speech recognition solution started at the STAR Lab and helped launch Nuance more than 20 years ago. In 1995, The SRI Language Modeling Toolkit (SRILM) was developed. This provides the tools to build and apply statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation, and machine translation. In terms of commercialization of natural automated speech recognition, SRI's natural language speech recognition software was the first to be deployed by a major corporation. In 1996, Charles Schwab & Co., Inc., used Nuance's speech recognition technology to allow customers to receive stock quotes over the telephone. One of the key features of the ‘Schwab Discount Brokerage system’, was the ability to recognize English words even when spoken by customers with accents. In 1997, Nuance Communications developed the first large scale commercial dialog system for United Parcel Services (UPS). UPS used the voice recognition platform to handle very large numbers of inquiries about package status. The company that would later merge with Nuance Communications started life as Visioneer, incorporated in 1992. In 1999, Visioneer acquired ScanSoft, Inc. (SSFT), and the combined company became known as ScanSoft. In September 2005, ScanSoft Inc. acquired and merged with Nuance Communications (NUAN), a natural language DOD-project spinoff from SRI International. The resulting company adopted the Nuance name. During the prior decade, the two companies competed in the commercial large-scale speech application business. === Data breach === Between 2014 and 2017, Nuance exposed over 45,000 patient records. == Solutions == Customer service virtual assistants Speech recognition — for people Speech recognition — for business Speech recognition — for physicians Accessibility Power PDF Managed Print Services Transcription === ScanSoft origins === In 1974, Raymond Kurzweil founded Kurzweil Computer Products, Inc. to develop the first omni-font optical character-recognition system – a computer program capable of recognizing text written in any normal font. In 1980, Kurzweil sold his company to Xerox. The company became known as Xerox Imaging Systems (XIS), and later ScanSoft. In March 1992, a new company called Visioneer, Inc. was founded to develop scanner hardware and software products, such as a sheetfed scanner called PaperMax and the document management software PaperPort. Visioneer eventually sold its hardware division to Primax Electronics, Ltd. in January 1999. Two months later, in March, Visioneer acquired ScanSoft from Xerox to form a new public company with ScanSoft as the new company-wide name. Prior to 2001, ScanSoft focused primarily on desktop imaging software such as TextBridge, PaperPort and OmniPage. Beginning with the December 2001 acquisition of Lernout & Hauspie assets, the company moved into the speech recognition business and began to compete with Nuance. Lernout & Hauspie had acquired speech recognition company Dragon Systems in June 2001, shortly before becoming bankrupt in October. Scansoft acquired speech recognition company SpeechWorks in 2003. === Partnership with Siri and Apple Inc. === In 2013, Nuance confirmed that its natural language processing algorithms supported Apple's Siri voice assistant. === Focus on health care === In 2019, Nuance spun off its automotive division as the company Cerence, allowing it to focus on health care applications. === Acquisition by Microsoft === On April 12, 2021, Microsoft announced that it would buy Nuance Communications for $19.7 billion, or $56 a share, a 22% increase over the previous closing price. Nuance's CEO, Mark Benjamin, stayed with the company. This was Microsoft's second-biggest acquisition up to that point, after its purchase of LinkedIn for $24 billion (~$30.7 billion in 2024) in 2016. Shortly after the deal, the Competition and Markets Authority, a UK regulatory body, stated it was looking into the deal on the basis of antitrust concerns. In December 2021, it was reported that the deal would be approved by the European Union. The acquisition was completed on March 4, 2022. In May 2023, Nuance announced an unspecified number of layoffs.

    Read more →
  • Linguatec

    Linguatec

    The Linguatec Sprachtechnologien GmbH is a language technology provider, specialized in the field of machine translation, speech synthesis and speech recognition. Linguatec was founded in Munich in 1996 and its headquarters are in Pasing. Linguatec has won the European Information Society Technologies Prize three times. On their website, they are now using the online service Voice Reader Web, so that the information can be read out in every language by means of a text-to-speech function. == Core areas == Machine translation The different versions of Personal Translator (seven language pairs) can be used "for home use" or for professional business use in the company network. In addition to this, specialist dictionaries are offered to broaden standard vocabulary. Speech synthesis The Voice Reader text-to-speech program reads in twelve languages: German, British English, American English, French, Quebec French, Spanish, Mexican Spanish, Italian, Dutch, Portuguese, Czech, Chinese. Speech recognition Voice Pro is based on ViaVoice technology from IBM. There are special software programs for doctors and lawyers. == Patents == 2005 pending patent application for a newly developed hybrid technology that uses the intelligence of neural networks for machine translation. == Awards == 2004 European IT Prize for Beyond Babel 2004 test winner Stiftung Warentest – best voice recognition 1998 European IT Prize – applied voice recognition 1996 European IT Prize – automated translation == Studies == 2005 University of Regensburg: Voice Reader user test 2002 Fraunhofer Institute for Industrial Engineering and Organization IAO: user study on the efficiency of machine translation

    Read more →
  • TU Me

    TU Me

    TU (formerly TU Me) is a digital platform developed by Telefónica and operated through its subsidiary Telefónica Innovación Digital. Initially launched in 2012 as a messaging app under the name TU Me, the brand was later revived in 2024 to designate a new suite of digital products focused on privacy, cybersecurity, and digital identity. == TU Me (2012–2014) == TU Me was a free mobile application released by Telefónica in May 2012. It allowed users to make voice calls, send texts, share photos and locations, and store conversation history in the cloud. The app was available for iOS and Android platforms, positioned as an alternative to services like WhatsApp and Viber. Despite early interest, TU Me was discontinued a few years later and removed from major app stores. Telefónica did not continue development of this version beyond its initial release cycle. == TU (2024–present) == In January 2024, Telefónica relaunched the brand TU through its technology subsidiary Telefónica Innovación Digital. Unlike its predecessor, the new TU is not a messaging app but a digital product platform offering solutions in cybersecurity, identity management, and cryptographic technology. The project includes a range of services built with technologies such as artificial intelligence, blockchain, and post-quantum cryptography. It operates independently from Movistar and targets both individual users and businesses. Notable products include: Latch: a digital access control system for securing user accounts. VerifAI: an AI-based tool for detecting manipulated media (images, audio, video). Metashield: software to identify and remove hidden metadata in documents. Wallet: a digital wallet for managing crypto-assets. Quantum Drop: encrypted file transfer system using post-quantum technology. Quantum Encryption: a security tool for IoT and private networks. Gallery: a blockchain-based digital art marketplace.

    Read more →
  • Fyuse

    Fyuse

    Fyuse is a spatial photography app which lets users capture and share interactive 3D images. By tilting or swiping one's smartphone, one can view such "fyuses" from various angles — as if one were walking around an object or subject. The app blends photography and video to create an interactive medium and was first published for iOS in April 2014. The Android version was released at the end of 2014. == The app == Fyuse lets users capture panoramas, selfies, and full 360° views of objects and allows one to view captured moments from different angles. It has its own personal gallery, social network and standalone web integration. With the app, Fyusion also created a social networking platform similar to Instagram. Fyuses can be shared, commented on, liked and re-shared to one's followers (called Echoes). One can build a network of followers and with engagement tracking, one can see how many times an image has been interacted with The images can also be saved for private, offline view, or shared to other social networks, like Facebook or Twitter, or embedded on a website where the images can be interacted with by desktop users via dragging the mouse. Furthermore, in the compass tab other fyuses can be discovered using the app's system of tags and categories. One's Fyuse feed is prepopulated with top users, and one can follow people to see when they post a new fyuse. The app will also find one's friends if one signs up with Facebook or connects it with one's Twitter account. To create a fyuse one moves around a person or object with one's phone's camera in one direction or moving/tilting one's phone around while holding one's finger on the screen. By combining photography and video the app allows one to capture moments that one may not have otherwise been able to capture by recording not one moment in time but stitched together little moments. According to Fyusion CEO Radu Rusu, a photo freezes a moment in time, while a video captures moments in a linear timeline — both still flat, when viewed. A fyuse image captures a moment in space, where one can not only see one side of something, but also around it. When it is done rendering, fyuses can also be edited – one can trim the fyuse for length and edit the brightness, contrast, exposure, saturation and sharpness. One can also add a vignette and apply a filters, with options to adjust their intensity. After editing, one can write a description, add hashtags, and tag parts of the fyuse before one can (voluntarily) publish and share it. Version 1.0 has been described as "alpha prototype" and version 2.0 was released on 17 December 2014. Version 3.0 introduced 3D tagging by which users can layer 3D graphic that animate accordingly with each interaction to add some context to the content. Version 4.0 was released on December 21, 2016 for iOS. Since January 2016 (v3.2) the app allows the export of fyuses as Live Photos. The app has also been described as a more sophisticated version of 3D stickers and flip images. == Applications == The app has many applications for e-commerce such as for fashion designers who want to showcase a garment from every angle, or real estate listings and Airbnb-type sites that want to make their rental properties seem as enticing as possible. The app can also be used for interactive art, 360° panoramas and selfies. == History == San Francisco-based Fyusion Inc.'s three founders — Radu B. Rusu, CTO Stefan Holzer, and VP of Engineering Stephen Miller — worked together at Willow Garage, the robotics research lab started by early Google employee Scott Hassan in the area of "personal robotics" — Hassan decided to turn the lab into more of an incubator, suggesting that the members spin off their technologies into consumer-facing enterprises. Rusu first set out with an open-source 3D perception software startup called Open Perception. Fyusion was officially founded in 2013, and soon after Rusu and his cofounders patented the technology for spatial photography. The company closed a seed funding round at the end of May, raising $3.35 million from investors, including an angel investment from Sun Microsystems cofounder Andreas Bechtolsheim. In 2014 the Fyuse team consisted of 13 employees, mostly engineers and designers, recruited from around the globe. In March 2015 the team displayed their app at Katy Perry's premiere for the movie "Prismatic World Tour on Epix" where Perry also took Fyuse for a test run. == Augmented reality == In September 2016 Fyusion unveiled its platform for creating augmented reality content using ones smartphone. It takes the images from ones smartphone and converts them into 3D holographic images, which one can then view on an AR headset. According to Rusu "by making it easy for people to capture their surroundings on any mobile device, [Fyusion is] revolutionizing the way that people view the world around them" and also states that for "AR to be successful, anyone should be able to create content for it" opposed to the current "small number of content creators and an even smaller number of hardware players". According to him "the applications of [Fyusion's] technology for consumers and businesses are incredibly limitless". The platform uses the company's patented 3D spatio-temporal platform that uses advanced sensor fusion, machine learning and computer vision algorithms and part of the platform is built into the Fyuse app. Before committing to releasing a separate consumer product the company intends to wait until the HoloLens device becomes available to the public. Until then any Fyuse representation created using Fyuse is AR ready and will be able to be shown in HoloLens in the future. == Fyuse - Point of No Return == Fyuse - Point of No Return is a science fiction short advert for Fyuse 3.0 in which Fyuse's digital medium is extrapolated into the future. In the film a woman uses a mini scanning-drone to 3D scan a tree with Fyuse and later recreate it as an augmented reality object at another place.

    Read more →
  • Georges Giralt PhD Award

    Georges Giralt PhD Award

    The Georges Giralt PhD Award is a European scientific prize for extraordinary contributions to robotics. It is awarded yearly at the European Robotics Forum by euRobotics AISBL, a non-profit organisation based in Brussels with the objective of turning robotics beneficial for Europe’s economy and society. Georges Giralt received his PhD in 1958, from Paul Sabatier University, in the domain of electrical machines, and soon afterwards became a pioneer in robotics, in Europe and worldwide. He was especially instrumental in bringing in scientific foundations and methodology when the domain was still young, and a loose coupling of mechanical and electrical engineering, adopting the early results of automatic control. The high reputation of the Georges Giralt PhD Award is based on the prominent role of the awarding institution euRobotics. With more than 250 member organisations, euRobotics represents the academic and industrial robotics community in Europe. Moreover, it provides the European robotics community with a legal entity to engage in a public/private partnership with the European Commission. The award is covered by various media. Entitled for participation in the Georges Giralt PhD Award are all robotics-related dissertations which have been successfully defended at a European university. The US-American counterpart is the Dick Volz Award. == Award winners == 2026: Antonio González Morgado 2025: Erfan Shahriari 2024: Manuel Keppler 2023: Antonio Andriella, Ribin Balachandran 2022: Antonio Loquercio, Michael Lutter 2021: Giuseppe Averta, Bernd Henze 2020: Cosimo Della Santina 2019: Grazioso Stanislao, Teodor Tomic 2018: Frank Bonnet, Daniel Leidner 2017: Johannes Englsberger 2016: Alexander Dietrich, Mark Müller 2015: Jörg Stückler 2014: Manuel Catalano, Fabien Expert, Rainer Jaekel 2013: Jens Kober 2012: Sami Haddadin 2011: Mario Pratts 2010: Ludovic Righetti 2009: Alejandro-Dizan Vasquez-Govea 2008: Cyrill Stachniss, Eduardo Rocon 2007: Pierre Lamon 2006: Martijn Wisse 2005: Juan Andrade Cetto 2004: Gilles Duchemin 2003: Ralf Koeppe 2002: Gianluca Antonelli, Jens-Steffen Gutmann

    Read more →
  • EffectsLab Pro

    EffectsLab Pro

    EffectsLab Pro is a discontinued visual effects software product developed by FXhome. It has since been superseded by the FXhome HitFilm range. The company also produced a limited functionality version, EffectsLab Lite, containing just the Particle engine. A more extensive product, VisionLab Studio, combined the functionality of EffectsLab Pro and the company's CompositeLab Pro product with enhancements to both. == Effects Engines == The effects are generated by the program's effect engines: The Neon Light engine allows light beams to be drawn onto the video, allowing the generation of lightsaber-like weapons, neon lighting, fantasy glow effects and laser blasts. The Particle engine is used for particle effects, such as smoke, fire, explosions, and weather effects. The Muzzle Flash engine is designed for creating and animating muzzle flashes such as machine gun firing, tank blasts, etc. It's possible to rotate the created muzzle flash in 3D, making it the only engine with 3D use. The Optics engine is designed for creating artificial lens flares and light sources. It is useful for enhancing other light-based effects, and mimicking the distinctive flashes of light that accompany Star Wars' lightsaber battles. The Laser engine (introduced in EffectsLab Pro in late 2007) is designed as a simplified method of creating laser weapon effects, including the ability to add simulated perspective to the effect. == Presets == EffectsLab Pro allows the user to save the effects using presets. Since all effects are generated from settings in the different engines, it is fairly easy to generate an XML style description of the effect. It is also possible to share presets on FXhome's website.

    Read more →
  • Afghan Girls Robotics Team

    Afghan Girls Robotics Team

    The Afghan Girls Robotics Team, also known as the Afghan Dreamers, is an all-girl robotics team from Herat, Afghanistan, founded through the Digital Citizen Fund (DCF) in 2017 by Roya Mahboob and Alireza Mehraban. It is made up of girls between ages 12 and 18 and their mentors. Several members of the team were relocated to Qatar and Mexico by humanitarian and tech entrepreneur Sarah Porter following the fall of Kabul in August 2021. A documentary film featuring members of the team, titled Afghan Dreamers, was released by MTV Documentary Films in 2023. == Origins == The Afghan Girls Robotics Team was co-founded in 2017 by Roya Mahboob, who is their coach, mentor and sponsor, and founder of the Digital Citizen Fund (DCF), which is the parent organization for the team. Dean Kamen was planning a 2017 competition in the United States and had recruited Mahboob to form a team from Afghanistan. Out of 150 girls, 12 were selected for the first team. Before parts were sent by Kamen, they trained in the basement of the home of Mahboob's parents, with scrap metal and without safety equipment under the guidance of their coach, Mahboob's brother Alireza Mehraban, who is also a co-founder of the team. == 2017 and 2018 == In 2017, six members of the Afghan Girls Robotics Team traveled to the United States to participate in the international FIRST Global Challenge robotics competition. Their visas were rejected twice after they made two journeys from Herat to Kabul through Taliban-controlled areas, before officials in the United States government intervened to allow them to enter the United States. Customs officials also detained their robotics kits, which left them two weeks to construct their robot, unlike some teams that had more time. They were awarded a Silver medal for Courageous Achievement. One week after they returned home from the competition, the father of team captain Fatemah Qaderyan, Mohammad Asif Qaderyan, was killed in a suicide bombing. After their United States visas expired, the team participated in competitions in Estonia and Istanbul. Three of the 12 members participated in the 2017 Entrepreneurial Challenge at the Robotex festival in Estonia, and won the competition for their solar-powered robot designed to assist farmers. In 2018, the team trained in Canada, continued to travel in the United States for months and participate in competitions. == 2019 == The Afghan Girls Robotics team had aspirations to develop a science and technology school for girls in Afghanistan. Roya Mahboob interfaced with the School of Engineering and Applied Sciences (SEAS), the School of Architecture, and the Whitney and Betty MacMillan Center for International and Area Studies Yale University to design the infrastructure for what they named The Dreamer Institute. == 2020 == In March 2020, the governor of Herat at the time, in response to the COVID-19 pandemic in Afghanistan and a scarcity of ventilators, sought help with the design of low-cost ventilators, and the Afghan Girls Robotics Team was one of six teams contacted by the government. Using a design from Massachusetts Institute of Technology and with guidance from MIT engineers and Douglas Chin, a surgeon in California, the team developed a prototype with Toyota Corolla parts and a chain drive from a Honda motorcycle. UNICEF also supported the team with the acquisition of necessary parts during the three months they spent building the prototype that was completed in July 2020. Their design costs around $500 compared to $50,000 for a ventilator. In December 2020, Minister of Industry and Commerce Nizar Ahmad Ghoryani donated funding and obtained land for a factory to produce the ventilators. Under the direction of their mentor Roya Mahboob, the Afghan Dreamers also designed a UVC Robot for sanitization, and a Spray Robot for disinfection, both of which were approved by the Ministry of Health for production. == 2021 == In early August 2021, Somaya Faruqi, former captain of the team, was quoted by Public Radio International about the future of Afghanistan, stating, "We don’t support any group over another but for us what’s important is that we be able to continue our work. Women in Afghanistan have made a lot of progress over the past two decades and this progress must be respected." On August 17, 2021, the Afghan Girls Robotics Team and their coaches were reported to be attempting to evacuate, but unable to obtain a flight out of Afghanistan, and a lawyer appealed to Canada for assistance regarding the evacuation of the team members. As of August 19, 2021, nine members of the team and their coaches had evacuated to Qatar. The founder of the team, Roya Mahboob, and DCF board member, Elizabeth Schaeffer Brown, were previously in contact with the Qatari government to assist the team members in their evacuation from Afghanistan. By August 25, 2021, some members arrived in Mexico. Saghar, a team member who evacuated to Mexico, said, "We wanted to continue the path that we started to continue to go for our achievements and to go for having our dreams through reality. So that's why we decided to leave Afghanistan and go for somewhere safe" in an interview with The Associated Press. The members who have left Afghanistan participated in an online robotics competition in September and plan to continue their education. A documentary film titled Afghan Dreamers, produced by Beth Murphy and directed by David Greenwald, was in post-production when the team began to evacuate. == 2022 == The Afghan Dreamers were involved in a training program at the Texas A&M University at Qatar’s STEM Hub. == 2023 == The Afghan Girls Robotics Team had a booth at the 5th UN Conference on the Least Developed Countries, where they displayed some of the robots the team had constructed. == Afghan Dreamers documentary == The Afghan Dreamers documentary from MTV Documentary Films premiered in May 2023 on Paramount+. The film was directed by David Greenwald and produced by David Cowan and Beth Murphy. In a review for Screen Daily, Wendy Ide wrote, "This film, with its likeable cast of girl nerds and positive message, should enjoy a warm reception on the festival circuit, and will be of particular interest to events seeking to showcase women's stories from around the world. It also serves as a timely cautionary tale – a case study on just how quickly the rights and the opportunities of women can be curtailed, at the behest of the men in power." == Honors and awards == 2017 Silver medal for Courageous Achievement at the FIRST Global Challenge, science and technology 2017 Benefiting Humanity in AI Award at World Summit AI 2017 Winner, Entrepreneurship Challenge at Robotex in Estonia 2018 Permission to Dream Award, Raw Film Festival 2018 Conrad Innovation Challenge, Raw Film Festival 2018 Rookie All Star – District Championship, Canada 2018 Asia Game Changer Award Honoree 2019 Inspiring in Engineering Award – FIRST Detroit World Championship 2019 Asia Game Changer Award of California 2019 Safety Award – FIRST Global, Dubai 2021 Forbes 30 Under 30 Asia 2022 World Championships, Genoa, Switzerland

    Read more →
  • Wispr

    Wispr

    Wispr AI is a software company founded in 2021 by Tanay Kothari and Sahaj Garg that develops voice-based interfaces for computers and other devices. The company’s main product, Wispr Flow, is an AI-powered speech-to-text application available on macOS, Windows and iOS. == History == Wispr was founded in 2021 with the goal of building a non-invasive wearable device that would allow users to control smartphones without touch input. The device was intended to translate neurological signals into actions and to enable silent text entry by mouthing words, drawing on techniques similar to brain–computer interfaces. Early funding was directed toward this hardware-focused effort. After around three years of development, Wispr concluded that contemporary AI systems were not sufficient for the requirements of the wearable device. The company shifted its focus to Flow voice dictation software, the software layer originally built for the wearable, and in 2024 released a macOS application based on this platform. == Wispr Flow == Wispr Flow (often referred to as Flow) is a speech-to-text application for macOS, Windows and iOS. It provides real-time dictation and transcription in more than 100 languages and can operate across applications, including email clients, messaging platforms and chatbots. In June 2025 Wispr released an iOS version that functions as a third-party keyboard, allowing voice input in any app. == Technology == Wispr Flow is based on automatic speech recognition (ASR) and other AI models. The system adapts to individual users over time, learning their vocabulary and preferred style with the aim of reducing manual editing. Flow operates through configurable “Flow Sessions”, defined as time windows during which the app has access to the microphone; users can set session timeouts or disable automatic time limits. == Users and Adoption == Wispr initially targeted users such as venture capitalists, entrepreneurs and executives who process large volumes of text and often work in private or flexible environments. The user base later expanded via platforms such as Product Hunt to students, software developers, writers, lawyers and consultants. Flow has also been adopted by users with conditions such as ADHD, dyslexia, paralysis and carpal tunnel syndrome. About 40% of users are in the United States, 30% in Europe and the remaining 30% in other regions. More than 30% of users come from non-technical backgrounds. Flow supports 104 languages, with approximately 40% of dictations in English and 60% in other languages, including Spanish, French, German, Dutch, Hindi and Mandarin. Wispr has reported monthly user growth above 50%, a six-month active-user retention rate of about 80%, a payment rate around 19%, and revenue of approximately US$3.8 million between July 2024 and July 2025. == Development == Wispr has announced plans for an Android application and maintains waiting lists for Android, Linux and web versions of Flow. The company is developing shared-context features for teams so that the software can recognize common terminology within organizations and has stated that it aims to evolve Flow into a broader AI assistant for tasks such as messaging, note-taking and reminders. Wispr has also reported working with unnamed AI hardware partners on interaction layers for future devices. == Funding == In 2025 Wispr raised US$30 million in a Series A funding round led by Menlo Ventures, with participation from NEA, 8VC and several individual investors, including Evan Sharp and Henry Ward. Earlier investors include Neo, MVP Ventures and AIX Ventures. In November of that same year, the company raised a US$25 million Series A extension led by Notable Capital, with participation from Flight Fund, bringing its total funding to US$81 million. Wispr competes with other AI-based dictation and voice-input tools, including Aqua, Talktastic, Superwhisper and Betterdication.

    Read more →
  • Haskins Laboratories

    Haskins Laboratories

    Haskins Laboratories, Inc. is an independent research laboratory, founded in 1935 and located in New Haven, Connecticut since 1970. Many current Haskins researchers are affiliated with Yale University's Child Study Center and/or the University of Connecticut. Haskins is a multidisciplinary and international community of researchers who conduct basic research on spoken and written language and global literacy. A guiding perspective of their research has been to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system. == Research tools and facilities == Haskins Laboratories is equipped, in-house, with a comprehensive suite of tools and capabilities to advance its mission of research into language and literacy. As of 2014, these included: Anechoic chamber Electroencephalography BioSemi 264 electrode, 24 bit Active Two System EGI 128 electrode, Geodesic EEG System 300 Electromagnetic articulography (EMMA) Carstens AG501 NDI WAVE Eye tracking: HL is equipped with 3 SR Research eye-trackers. 2 Model Eyelink 1000 systems. 1 Model Eyelink 1000plus system. Magnetic resonance imaging: Haskins has access to MRI scanners through agreements with the University of Connecticut and the Yale School of Medicine. On-site, HL has a Linux computer cluster dedicated to analysis of MRI data. Motion capture: HL is equipped with a Vicon motion capture system with one Basler high-speed digital camera, six Vicon MX T-20 cameras and a Vicon MX Giganet for synching camera data and connecting cameras to the data capture computer. Near infrared spectroscopy: HL has a TechEn CW6 8x8 system (four emitters; eight detectors). Ultrasound sonogram == History == Many researchers have contributed to scientific breakthroughs at Haskins Laboratories since its founding. All of them are indebted to the pioneering work and leadership of Caryl Parker Haskins, Franklin S. Cooper, Alvin Liberman, Seymour Hutner and Luigi Provasoli. The history presented here focuses on the research program of the division of Haskins Laboratories that, since the 1940s, has been most well known for its work in the areas of speech, language, and reading. === 1930s === Caryl Haskins and Franklin S. Cooper established Haskins Laboratories in 1935. It was originally affiliated with Harvard University, MIT, and Union College in Schenectady, NY. Caryl Haskins conducted research in microbiology, radiation physics, and other fields in Cambridge, MA and Schenectady. In 1939 Haskins Laboratories moved its center to New York City. Seymour Hutner joined the staff to set up a research program in microbiology, genetics, and nutrition. The descendant of the division led by Hutner program eventually became a department of Pace University in New York. The two identically named organizations are no longer formally affiliated. === 1940s === The U. S. Office of Scientific Research and Development, under Vannevar Bush asked Haskins Laboratories to evaluate and develop technologies for assisting blinded World War II veterans. Experimental psychologist Alvin Liberman joined Haskins Laboratories to assist in developing a "sound alphabet" to represent the letters in a text for use in a reading machine for the blind. Luigi Provasoli joined Haskins Laboratories to set up a research program in marine biology. The program in marine biology moved to Yale University in 1970 and disbanded with Provasoli's retirement in 1978. === 1950s === Franklin S. Cooper invented the pattern playback, a machine that converts pictures of the acoustic patterns of speech back into sound. With this device, Alvin Liberman, Cooper, and Pierre Delattre (and later joined by Katherine Safford Harris, Leigh Lisker, Arthur Abramson, and others), discovered the acoustic cues for the perception of phonetic segments (consonants and vowels). Liberman and colleagues proposed a motor theory of speech perception to resolve the acoustic complexity: they hypothesized that we perceive speech by tapping into a biological specialization, a speech module, that contains knowledge of the acoustic consequences of articulation. Liberman, aided by Frances Ingemann and others, organized the results of the work on speech cues into a groundbreaking set of rules for speech synthesis by the Pattern Playback. === 1960s === Franklin S. Cooper and Katherine Safford Harris, working with Peter MacNeilage, were the first researchers in the U.S. to use electromyographic techniques, pioneered at the University of Tokyo, to study the neuromuscular organization of speech. Leigh Lisker and Arthur Abramson looked for simplification at the level of articulatory action in the voicing of certain contrasting consonants. They showed that many acoustic properties of voicing contrasts arise from variations in voice onset time, the relative phasing of the onset of vocal cord vibration and the end of a consonant. Their work has been widely replicated and elaborated, here and abroad, over the following decades. Donald Shankweiler and Michael Studdert-Kennedy used a dichotic listening technique (presenting different nonsense syllables simultaneously to opposite ears) to demonstrate the dissociation of phonetic (speech) and auditory (nonspeech) perception by finding that phonetic structure devoid of meaning is an integral part of language, typically processed in the left cerebral hemisphere. Liberman, Cooper, Shankweiler, and Studdert-Kennedy summarized and interpreted fifteen years of research in "Perception of the Speech Code", still among the most cited papers in the speech literature. It set the agenda for many years of research at Haskins and elsewhere by describing speech as a code in which speakers overlap (or coarticulate) segments to form syllables. Researchers at Haskins connected their first computer to a speech synthesizer designed by Haskins Laboratories' engineers. Ignatius Mattingly, with British collaborators, John N. Holmes and J.N. Shearme, adapted the Pattern playback rules to write the first computer program for synthesizing continuous speech from a phonetically spelled input. A further step toward a reading machine for the blind combined Mattingly's program with an automatic look-up procedure for converting alphabetic text into strings of phonetic symbols. === 1970s === In 1970, Haskins Laboratories moved to New Haven, Connecticut, and entered into affiliation agreements with Yale University and the University of Connecticut; Haskins remains fully independent of both Yale and UConn, administratively and financially. The lab's original location in New Haven, at 270 Crown Street (from 1970 to 2005), was leased from Yale University. Isabelle Liberman, Donald Shankweiler, and Alvin Liberman teamed up with Ignatius Mattingly to study the relationship between speech perception and reading, a topic implicit in Haskins Laboratories' research program since its inception. They developed the concept of phonemic awareness, the knowledge that would-be readers must be aware of the phonemic structure of their language in order to be able to read. Leonard Katz related the work to contemporary cognitive theory and provided expertise in experimental design and data analysis. Under the broad rubric of the "alphabetic principle", this is the core of the lab's present program of reading pedagogy. Patrick Nye joined Haskins Laboratories to lead a team working on the reading machine for the blind. The project culminated when the addition of an optical character recognizer allowed investigators to assemble the first automatic text-to-speech reading machine. By the end of the decade this technology had advanced to the point where commercial concerns assumed the task of designing and manufacturing reading machines for the blind. In 1973, Franklin S. Cooper was selected to form a panel of six experts charged with investigating the famous 18-minute gap in the White House office tapes of President Richard Nixon related to the Watergate scandal. Building on earlier work, Philip Rubin developed the sinewave synthesis program, which was then used by Robert Remez, Rubin, and colleagues to show that listeners can perceive continuous speech without traditional speech cues from a pattern of sinewaves that track the changing resonances of the vocal tract. This paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space. Philip Rubin and colleagues developed Paul Mermelstein's anatomically simplified vocal tract model, originally worked on at Bell Laboratories, into the first articulatory synthesizer that can be controlled in a phy

    Read more →
  • Butler in a Box

    Butler in a Box

    Butler in a Box was an early voice-controlled home automation device developed in 1983 by magician Gus Searcy and programmer Franz Kavan. The device allowed users to control various home electronics, such as lights and phones, using voice commands. It predated modern smart speakers and virtual assistants by several decades. == History == The idea for the Butler in a Box originated in 1983 when Searcy was asked by friends why he couldn't simply command lights to turn on and off if he could pull rabbits out of hats, given his background as a professional magician. Searcy partnered with former IBM programmer Kavan to develop the device, with their first prototype being named "Sidney". The Butler in a Box combined remote control technology with voice recognition to enable control of home devices. However, it faced challenges due to the technological limitations of the era and its high price point of nearly $1,500 (equivalent to around $3,700 in 2021). == Features and functionality == Users could activate the Butler in a Box by speaking a wake word, typically a traditional butler name, and the device would address the user as "boss". It was capable of performing tasks such as: Turning lights on and off, controlling individual zones if lights were connected to remote control modules Making and receiving phone calls Setting timers Pairing with sensors to function as a security alarm system However, the device required extensive voice training for each user, a time-consuming process compared to modern voice recognition. Additionally, settings and trained commands would be lost if power was out for over 3 hours due to the volatile memory technology used at the time. == Reception and legacy == While innovative for its time, the Butler in a Box did not achieve widespread commercial success due to its high price and the technical limitations of the 1980s. Nevertheless, it served as an important early step in the development of home automation and showcased the potential for voice-controlled technology to enhance accessibility and convenience in the home. Decades later, products like Amazon Alexa, Google Home, and Apple's Siri would make voice-controlled smart home devices commonplace and affordable, building on the groundwork laid by early attempts like the Butler in a Box.

    Read more →
  • Application-release automation

    Application-release automation

    Application-release automation (ARA) refers to the process of packaging and deploying an application or update of an application from development, across various environments, and ultimately to production. ARA solutions must combine the capabilities of deployment automation, environment management and modeling, and release coordination. == Relationship with DevOps == ARA tools help cultivate DevOps best practices by providing a combination of automation, environment modeling and workflow-management capabilities. These practices help teams deliver software rapidly, reliably and responsibly. ARA tools achieve a key DevOps goal of implementing continuous delivery with a large quantity of releases quickly. == Relationship with deployment == ARA is more than just software-deployment automation – it deploys applications using structured release-automation techniques that allow for an increase in visibility for the whole team. It combines workload automation and release-management tools as they relate to release packages, as well as movement through different environments within the DevOps pipeline. ARA tools help regulate deployments, how environments are created and deployed, and how and when releases are deployed. == ARA Solutions == All ARA solutions must include capabilities in automation, environment modeling, and release coordination. Additionally, the solution must provide this functionality without reliance on other tools.

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • Fully probabilistic design

    Fully probabilistic design

    Decision making (DM) can be seen as a purposeful choice of action sequences. It also covers control, a purposeful choice of input sequences. As a rule, it runs under randomness, uncertainty and incomplete knowledge. A range of prescriptive theories have been proposed how to make optimal decisions under these conditions. They optimise sequence of decision rules, mappings of the available knowledge on possible actions. This sequence is called strategy or policy. Among various theories, Bayesian DM is broadly accepted axiomatically based theory that solves the design of optimal decision strategy. It describes random, uncertain or incompletely known quantities as random variables, i.e. by their joint probability expressing belief in their possible values. The strategy that minimises expected loss (or equivalently maximises expected reward) expressing decision-maker's goals is then taken as the optimal strategy. While the probabilistic description of beliefs is uniquely and deductively driven by rules for joint probabilities, the composition and decomposition of the loss function have no such universally applicable formal machinery. Fully probabilistic design (of decision strategies or control, FPD) removes the mentioned drawback and expresses also the DM goals of by the "ideal" probability, which assigns high (small) values to desired (undesired) behaviours of the closed DM loop formed by the influenced world part and by the used strategy. FPD has axiomatic basis and has Bayesian DM as its restricted subpart. FPD has a range of theoretical consequences , and, importantly, has been successfully used to quite diverse application domains.

    Read more →