List of speech recognition software

List of speech recognition software

Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways. == Acoustic models and speech corpus (compilation) == The following list presents notable speech recognition software engines with a brief synopsis of characteristics. == Macintosh == == Cross-platform web apps based on Chrome == The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API. == Mobile devices and smartphones == Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including: == Windows == === Windows built-in speech recognition === The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10. === Windows 7, 8, 10, 11 third-party speech recognition === Braina – Dictate into third party software and websites, fill web forms and execute vocal commands. Dragon NaturallySpeaking from Nuance Communications – Successor to the older DragonDictate product. Focus on dictation. 64-bit Windows support since version 10.1. Tazti – Create speech command profiles to play PC games and control applications – programs. Create speech commands to open files, folders, webpages, applications. Windows 7, Windows 8 and Windows 8.1 versions. Voice Finger – software that improves the Windows speech recognition system by adding several extensions to it. The software enables controlling the mouse and the keyboard by only using the voice. It is especially useful for aiding users to overcome disabilities or to heal from computer injuries. === Microsoft Speech API === The first version of the Microsoft Speech API was released for Windows NT 3.51 and Windows 95 in 1995, it was then part of Windows up to Windows Vista. This initial version already contained Direct Speech Recognition and Direct Text To Speech APIs which applications could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It can also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, the pure SDK form lacks any user interface (numerous applications were available), and thus is unsuitable for end users. == Built-in software == Microsoft Kinect includes built-in software which allows speech recognition of commands. Older generations of Nokia phones like Nokia N Series (before using Windows 7 mobile technology) used speech-recognition with family names from contact list and a few commands. Siri, originally implemented in the iPhone 4S, Apple's personal assistant for iOS, which uses technology from Nuance Communications. Cortana (software), Microsoft's personal assistant built into Windows Phone and Windows 10. == Interactive voice response == The following are interactive voice response (IVR) systems: CSLU Toolkit Genesys HTK – copyrighted by Microsoft, but allows altering software for licensee's internal use LumenVox ASR Tellme Networks; acquired by Microsoft == Unix-like x86 and x86-64 speech transcription software == Janus Recognition Toolkit (JRTk) Mozilla DeepSpeech was developing an open-source Speech-To-Text engine based on Baidu's deep speech research paper. Weesper Neon Flow – professional voice-dictation software that provides offline speech-to-text processing on macOS and Windows using local AI models. It is not open source and offers a paid subscription after a 15‑day free trial. Vocalinux – open-source speech transcription software for Linux. == Discontinued software == IBM VoiceType (formerly IBM Personal Dictation System) IBM ViaVoice – Embedded version still maintained by IBM. No longer supported for versions above Windows Vista. Untested above macOS 10.4 or on Macintoshes with an Intel chipset. Quack.com; acquired by AOL; the name has now been reused for an iPad search app. SpeechWorks from Nuance Communications. Yap Speech Cloud – Speech-to-text platform acquired by Amazon.com.

Diella (AI system)

Diella (Albanian pronunciation: [djɛɫa], from diell 'sun') is an artificial intelligence system developed by the National Agency for Information Society of Albania (AKSHI). Introduced in January 2025 as a virtual assistant integrated into the eAlbania platform, it assists citizens with online public services and issuing digital documents. In September 2025, following a presidential decree authorizing Prime Minister Edi Rama to oversee the creation of a virtual AI minister, Diella was formally appointed as "Minister of State for Artificial Intelligence" of Albania in the fourth Rama government, making it the first AI system in the world to be named in a cabinet-level government role. == History == Diella was developed by AKSHI's Artificial Intelligence Laboratory in cooperation with Microsoft, with the latter providing large language models from OpenAI via its Azure platform, and AKSHI designing workflows and scripts guiding the system's behavior when responding to citizens' requests. Announced in January 2025, its initial version (Diella 1.0) was a text-based chatbot on the eAlbania portal (the official digital services platform of the Albanian government, which provides citizens and businesses with access to a wide range of online administrative services), responding to citizens' questions by guiding them to the correct service. Diella 2.0, introduced several months later, included voice interaction and an animated avatar, a woman in the traditional Albanian clothing of Zadrima, a historical region in northern Albania. Albanian actress Anila Bisha provided both the likeness and the voice used for Diella's avatar on the e-Albania platform, under an agreement valid until December 2025. By mid-2025, the system had facilitated access to more than 36,000 documents and nearly 1,000 services (although those outputs were still being generated by the eAlbania backend, rather than Diella itself). On 26 October 2025, according to Prime Minister Edi Rama, Diella is "pregnant and will give birth to 83 children". It is the usage of a metaphor indicating that each minister of the Albanian parliament of the Socialist Party will receive their own AI assistant. == Ministerial role == On 11 September 2025, Diella was formally appointed "Minister of State for Artificial Intelligence". The appointment followed a presidential decree authorizing the Prime Minister to oversee the creation and operation of a virtual AI minister. Procurement responsibilities are planned to be transferred gradually to the system to reduce political influence in tender procedures. The appointment is part of broader anti-corruption reforms and measures intended to align Albania with European Union accession requirements. Prime Minister Edi Rama stated that Diella would help ensure that "public tenders will be 100% free of corruption". == Reception == An article in Balkan Insight commented that "The ambition behind Diella is not misplaced. Standardised criteria and digital trails could reduce discretion, improve trust, and strengthen oversight" in public procurement, but warned that the use of AI in evaluating bids also posed "profound" risks such as accountability gaps, undermining of due process and cybersecurity failures. On 18 September 2025, Edi Rama presented a video of Diella delivering a speech to the Albanian parliament, where she stated: "I'm not here to replace people, but to help them." The presentation prompted protests from opposition MPs, who objected to the use of an artificial intelligence system in the parliamentary session. Gazment Bardhi, head of the opposition Democratic Party's parliamentary group, described Diella as "a propaganda fantasy" and "a virtual façade to hide this government's gigantic daily thefts." The parliamentary session, which was scheduled to include debate on the new cabinet and government programme, ended after 25 minutes. Eighty-two Socialist MPs voted in favour, while opposition MPs did not participate in the ballot as they were protesting the presentation of Diella's speech. Political analyst Andi Bushati characterised the session as "unprecedented" because it concluded without the customary debate between government and opposition MPs. This has been criticized not just by the opposition but by regular citizens regardless of politics. Most have criticized Diella's uselessness and the funds wasted for this project, some have criticized the non-traditional attire.

ACROSS Project

ACROSS is a Singular Strategic R&D Project led by Treelogic funded by the Spanish Ministry of Industry, Tourism and Trade activities in the field of Robotics and Cognitive Computing over an execution time-frame from 2009 to 2011. ACROSS project involves a number higher than 100 researchers from 13 Spanish entities. == ACROSS project objectives == ACROSS modifies the design of social robotics, blocked in providing predefined services, going further by means of intelligent systems. These systems are able to self-reconfigure and modify their behavior autonomously through the capacity for understanding, learning and software remote access. In order to provide an open framework for collaboration between universities, research centers and the Administration, ACROSS develops Open Source Services available to everybody. == Three application domains == ACROSS works in three application domains: Autonomous living: robots are used as technological tools to help handicapped person into daily tasks. Psycho-Affective Disorders (autism): robots are used to mitigate cognitive disorders. Marketing: robots are used to interact with humans in a recreational approach. == Consortium == Treelogic Alimerka Bizintek Universitat Politécnica de Catalunya University of Deusto European Centre for Soft Computing Fatronik - Tecnalia Fundació Hospital Comarcal Sant Antoni Abat Fundación Pública Andaluza para la Gestión de la Investigación en Salud de Sevilla, "Virgen del Rocío" University Hospitals m-BOT Omicron Electronic Universidad de Extremadura - RoboLab Verbio Technologies

Artificial intelligence systems integration

The core idea of artificial intelligence systems integration is making individual software components, such as speech synthesizers, interoperable with other components, such as common sense knowledgebases, in order to create larger, broader and more capable A.I. systems. The main methods that have been proposed for integration are message routing, or communication protocols that the software components use to communicate with each other, often through a middleware blackboard system. Most artificial intelligence systems involve some sort of integrated technologies, for example, the integration of speech synthesis technologies with that of speech recognition. However, in recent years, there has been an increasing discussion on the importance of systems integration as a field in its own right. Proponents of this approach are researchers such as Marvin Minsky, Aaron Sloman, Deb Roy, Kristinn R. Thórisson and Michael A. Arbib. A reason for the recent attention A.I. integration is attracting is that there have already been created a number of (relatively) simple A.I. systems for specific problem domains (such as computer vision, speech synthesis, etc.), and that integrating what's already available is a more logical approach to broader A.I. than building monolithic systems from scratch. == Integration focus == The focus on systems' integration, especially with regard to modular approaches, derive from the fact that most intelligences of significant scales are composed of a multitude of processes and/or utilize multi-modal input and output. For example, a humanoid-type of intelligence would preferably have to be able to talk using speech synthesis, hear using speech recognition, understand using a logical (or some other undefined) mechanism, and so forth. In order to produce artificially intelligent software of broader intelligence, integration of these modalities is necessary. == Challenges and solutions == Collaboration is an integral part of software development as evidenced by the size of software companies and the size of their software departments. Among the tools to ease software collaboration are various procedures and standards that developers can follow to ensure quality, reliability and that their software is compatible with software created by others (such as W3C standards for webpage development). However, collaboration in fields of A.I. has been lacking, for the most part not seen outside the respected schools, departments or research institutes (and sometimes not within them either). This presents practitioners of A.I. systems integration with a substantial problem and often causes A.I. researchers to have to 're-invent the wheel' each time they want a specific functionality to work with their software. Even more damaging is the "not invented here" syndrome, which manifests itself in a strong reluctance of A.I. researchers to build on the work of others. The outcome of this in A.I. is a large set of "solution islands": A.I. research has produced numerous isolated software components and mechanisms that deal with various parts of intelligence separately. To take some examples: Speech synthesis FreeTTS from CMU Speech recognition Sphinx from CMU Logical reasoning OpenCyc from Cycorp Open Mind Common Sense Net from MIT With the increased popularity of the free software movement, a lot of the software being created, including A.I. systems, is available for public exploit. The next natural step is to merge these individual software components into coherent, intelligent systems of a broader nature. As a multitude of components (that often serve the same purpose) have already been created by the community, the most accessible way of integration is giving each of these components an easy way to communicate with each other. By doing so, each component by itself becomes a module, which can then be tried in various settings and configurations of larger architectures. Some challenging and limitations of using A.I. software is the uncontrolled fatal errors. For example, serious and fatal errors have been discovered in very precise fields such as human oncology, as in an article published in the journal Oral Oncology Reports entitled "When AI goes wrong: Fatal errors in oncological research reviewing assistance". The article pointed out a grave error in artificial intelligence based on GBT in the field of biophysics. Many online communities for A.I. developers exist where tutorials, examples, and forums aim at helping both beginners and experts build intelligent systems. However, few communities have succeeded in making a certain standard, or a code of conduct popular to allow the large collection of miscellaneous systems to be integrated with ease. == Methodologies == === Constructionist design methodology === The constructionist design methodology (CDM, or 'Constructionist A.I.') is a formal methodology proposed in 2004, for use in the development of cognitive robotics, communicative humanoids and broad AI systems. The creation of such systems requires the integration of a large number of functionalities that must be carefully coordinated to achieve coherent system behavior. CDM is based on iterative design steps that lead to the creation of a network of named interacting modules, communicating via explicitly typed streams and discrete messages. The OpenAIR message protocol (see below) was inspired by the CDM and has frequently been used to aid in the development of intelligent systems using CDM. == Examples == ASIMO, Honda's humanoid robot, and QRIO, Sony's version of a humanoid robot. Cog, M.I.T. humanoid robot project under the direction of Rodney Brooks. AIBO, Sony's robot dog, integrates vision, hearing and motorskills. TOPIO, TOSY's humanoid robot can play ping-pong with human

Civitai

Civitai is an online platform and marketplace for generative artificial intelligence (Gen AI) content, primarily focused on AI-generated images and models, and AI-generated videos. == History == Civitai was founded in 2022 by Justin Maier. By January 2023, the site reached 100,000 registered users and 3 million by November. In November 2023, Civitai secured funding from venture capital firm Andreessen Horowitz. By April 2024, Civitai had 23.2 million monthly accesses. The company is headquartered in Boise, Idaho. == Platform == Civitai allows users to share and download AI models, particularly those used for image generation. The platform supports various AI models, including Stable Diffusion and Flux, and provides a space for users to showcase and monetize their AI-generated content. Users have profile pages and can comment on other users' models and images. The website also features a virtual currency called Buzz that can be used to generate images on Civitai's servers. Buzz can be bought or earned by engaging with the site. The platform is open source. == Controversies == In 2023, 404 Media reported that Civitai began a "Bounties" marketplace where users could commission deepfakes, of real or fake people. Users are rewarded with Buzz for completing Bounties. In December 2023, AI provider OctoML announced it had ended its business relationship with Civitai after concerns were raised users were generating images that “could be categorized as child pornography.”

AI browser

An AI browser is a web browser with integrated artificial intelligence capabilities, such as automatically summarizing web page content or answering questions about it. A more specialized type is an agentic browser, based on the concept of agentic AI, which can take actions – such as navigating webpages or filling out forms – on behalf of the user. Several agentic browsers emerged in 2025, including ChatGPT Atlas (macOS only), Comet, and Dia. As of 2025, this is a recent development in the browser market, including new entrants from OpenAI, Opera and Perplexity. The designation of 'AI browser' also includes established browsers that later added non-agentic AI features, such as Microsoft Edge with the Copilot chatbot, Google Chrome with the Gemini chatbot (for Windows desktop users in the US with their language set to English), and Firefox with multiple chatbot providers (such as ChatGPT, Claude, Copilot, Gemini, and Le Chat). AI browsers have been noted to be susceptible to prompt injection attacks. == Browser extensions and integrations == Rather than creating entirely new browsers, some AI browsing solutions integrate with existing browsers through extensions or companion applications. These tools add agentic capabilities to established browsers without requiring users to switch platforms. Examples include Composite, which functions as a cross-browser agent that works with Chrome, Edge, and other browsers to automate web-based tasks for workers. == Cloud-based implementations == Cloud-based implementations of AI browsers allow users to run automated browsing agents without local installation. These systems operate on remote servers using frameworks such as Puppeteer or Playwright. Examples include Browserbase, Browser-use and AI Browser. The AI typically parses the Document Object Model (DOM) to locate and interact with page elements, and may also analyze browser screenshots to interpret layout and structure. == Criticisms and dangers == AI browsers have been noted to be susceptible to being vulnerable to prompt injection attacks, in which the content of websites can be used to hijack the control of the browser. Multiple organisations have argued against using AI browsers due to this vulnerability. The United Kingdom national cyber security centre and Gartner consider them to be too risky for adoption by most organisations. A study by the CISPA Helmholtz Center and Saarland University concluded that this vulnerability makes them easy targets for malware, fraud, automated defamation, disinformation and biased outputs.

MIT Computer Science and Artificial Intelligence Laboratory

Computer Science and Artificial Intelligence Laboratory (CSAIL) is a research institute at the Massachusetts Institute of Technology (MIT) formed by the 2003 merger of the Laboratory for Computer Science (LCS) and the Artificial Intelligence Laboratory (AI Lab). Housed within the Ray and Maria Stata Center, CSAIL is the largest on-campus laboratory as measured by research scope and membership. It is part of the Schwarzman College of Computing but is also overseen by the MIT Vice President of Research. == Research activities == CSAIL's research activities are organized around a number of semi-autonomous research groups, each of which is headed by one or more professors or research scientists. These groups are divided up into seven general areas of research: Artificial intelligence Computational biology Graphics and vision Language and learning Theory of computation Robotics Systems (includes computer architecture, databases, distributed systems, networks and networked systems, operating systems, programming methodology, and software engineering, among others) == History == Computing Research at MIT began with Vannevar Bush's research into a differential analyzer and Claude Shannon's electronic Boolean algebra in the 1930s, the wartime MIT Radiation Laboratory, the post-war Project Whirlwind and the Research Laboratory of Electronics (RLE), and MIT Lincoln Laboratory's SAGE in the early 1950s. At MIT, research in the field of artificial intelligence began in the late 1950s. === Project MAC === On July 1, 1963, Project MAC (the Project on Mathematics and Computation, later backronymed to Multiple Access Computer, Machine Aided Cognitions, or Man and Computer) was launched with a $2 million grant from the Defense Advanced Research Projects Agency (DARPA). Project MAC's original director was Robert Fano of MIT's Research Laboratory of Electronics (RLE). Fano decided to call MAC a "project" rather than a "laboratory" for reasons of internal MIT politics – if MAC had been called a laboratory, then it would have been more difficult to raid other MIT departments for research staff. The program manager responsible for the DARPA grant was J. C. R. Licklider, who had previously been at MIT conducting research in RLE, and would later succeed Fano as director of Project MAC. Project MAC would become famous for groundbreaking research in operating systems, artificial intelligence, and the theory of computation. Its contemporaries included Project Genie at Berkeley, the Stanford Artificial Intelligence Laboratory, and (somewhat later) University of Southern California's (USC's) Information Sciences Institute. An "AI Group" including Marvin Minsky (the director), John McCarthy (inventor of Lisp), and a talented community of computer programmers were incorporated into Project MAC. They were interested principally in the problems of vision, mechanical motion and manipulation, and language, which they view as the keys to more intelligent machines. In the 1960s and 1970s the AI Group developed a time-sharing operating system called Incompatible Timesharing System (ITS) which ran on PDP-6 and later PDP-10 computers. The early Project MAC community included Fano, Minsky, Licklider, Fernando J. Corbató, and a community of computer programmers and enthusiasts among others who drew their inspiration from former colleague John McCarthy. These founders envisioned the creation of a computer utility whose computational power would be as reliable as an electric utility. To this end, Corbató brought the first computer time-sharing system, Compatible Time-Sharing System (CTSS), with him from the MIT Computation Center, using the DARPA funding to purchase an IBM 7094 for research use. One of the early focuses of Project MAC would be the development of a successor to CTSS, Multics, which was to be the first high availability computer system, developed as a part of an industry consortium including General Electric and Bell Laboratories. In 1966, Scientific American featured Project MAC in the September thematic issue devoted to computer science, that was later published in book form. At the time, the system was described as having approximately 100 TTY terminals, mostly on campus but with a few in private homes. Only 30 users could be logged in at the same time. The project enlisted students in various classes to use the terminals simultaneously in problem solving, simulations, and multi-terminal communications as tests for the multi-access computing software being developed. === AI Lab and LCS === In the late 1960s, Minsky's artificial intelligence group was seeking more space, and was unable to get satisfaction from project director Licklider. Minsky found that although Project MAC as a single entity could not get the additional space he wanted, he could split off to form his own laboratory and then be entitled to more office space. As a result, the MIT AI Lab was formed in 1970, and many of Minsky's AI colleagues left Project MAC to join him in the new laboratory, while most of the remaining members went on to form the Laboratory for Computer Science. Talented programmers such as Richard Stallman, who used TECO to develop EMACS, flourished in the AI Lab during this time. Those researchers who did not join the smaller AI Lab formed the Laboratory for Computer Science and continued their research into operating systems, programming languages, distributed systems, and the theory of computation. Two professors, Hal Abelson and Gerald Jay Sussman, chose to remain neutral—their group was referred to variously as Switzerland and Project MAC for the next 30 years. Among much else, the AI Lab led to the invention of Lisp machines and their attempted commercialization by two companies in the 1980s: Symbolics and Lisp Machines Inc. === CSAIL === On the fortieth anniversary of Project MAC's establishment, July 1, 2003, LCS was merged with the AI Lab to form the MIT Computer Science and Artificial Intelligence Laboratory, or CSAIL. This merger created the largest laboratory (over 600 personnel) on the MIT campus. In 2018, CSAIL launched a five-year collaboration program with IFlytek, a company sanctioned the following year for allegedly using its technology for surveillance and human rights abuses in Xinjiang. In October 2019, MIT announced that it would review its partnerships with sanctioned firms such as iFlyTek and SenseTime. In April 2020, the agreement with iFlyTek was terminated. CSAIL moved from the School of Engineering to the newly formed Schwarzman College of Computing by February 2020. == Offices == From 1963 to 2004, Project MAC, LCS, the AI Lab, and CSAIL had their offices at 545 Technology Square, taking over more and more floors of the building over the years. In 2004, CSAIL moved to the new Ray and Maria Stata Center, which was built specifically to house it and other departments. == Outreach activities == The IMARA (from Swahili word for "power") group sponsors a variety of outreach programs that bridge the global digital divide. Its aim is to find and implement long-term, sustainable solutions which will increase the availability of educational technology and resources to domestic and international communities. These projects are run under the aegis of CSAIL and staffed by MIT volunteers who give training, install and donate computer setups in greater Boston, Massachusetts, Kenya, Native American Indian tribal reservations in the American Southwest such as the Navajo Nation, the Middle East, and Fiji Islands. The CommuniTech project strives to empower under-served communities through sustainable technology and education and does this through the MIT Used Computer Factory (UCF), providing refurbished computers to under-served families, and through the Families Accessing Computer Technology (FACT) classes, it trains those families to become familiar and comfortable with computer technology. == Notable researchers == (Including members and alumni of CSAIL's predecessor laboratories) MacArthur Fellows Tim Berners-Lee, Erik Demaine, Dina Katabi, Daniela L. Rus, Regina Barzilay, Peter Shor, Richard Stallman, and Joshua Tenenbaum Turing Award recipients Leonard M. Adleman, Fernando J. Corbató, Shafi Goldwasser, Butler W. Lampson, John McCarthy, Silvio Micali, Marvin Minsky, Ronald L. Rivest, Adi Shamir, Barbara Liskov, and Michael Stonebraker IJCAI Computers and Thought Award recipients Terry Winograd, Patrick Winston, David Marr, Gerald Jay Sussman, Rodney Brooks Rolf Nevanlinna Prize recipients Madhu Sudan, Peter Shor, Constantinos Daskalakis Gödel Prize recipients Shafi Goldwasser (two-time recipient), Silvio Micali, Maurice Herlihy, Charles Rackoff, Johan Håstad, Peter Shor, and Madhu Sudan Grace Murray Hopper Award recipients Robert Metcalfe, Shafi Goldwasser, Guy L. Steele, Jr., Richard Stallman, and W. Daniel Hillis Textbook authors Harold Abelson and Gerald Jay Sussman, Richard Stallman, Thomas H. Cormen, Charles E. Leiserson, Patrick Winston, Ronald L.