AI Avatar For Zoom Meetings

AI Avatar For Zoom Meetings — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Calais (Reuters product)

    Calais (Reuters product)

    Calais is a service created by Thomson Reuters that automatically extracts semantic information from web pages in a format that can be used on the semantic web. Calais was launched in January 2008, and is free to use. The technology is now available via the website of Refinitiv, a provider of financial market data and infrastructure founded in 2018, that is a subsidiary of London Stock Exchange Group. The Calais Web service reads unstructured text and returns Resource Description Framework formatted results identifying entities, facts and events within the text. The service appears to be based on technology acquired when Reuters purchased ClearForest in 2007. The technology has also been used to automatically tag blog articles, and organize museum collections. Calais uses natural language processing technologies delivered via a web service interface.

    Read more →
  • YrWall

    YrWall

    YrWall is a Digital Graffiti Wall developed by event company Luma, where designs are created on a large wall using a modified spray paint can. The can contains no paint, instead it has an IR light which is tracked by a computer vision system and the image immediately back-projected onto the wall. The inbuilt YrWall software has much of the functionality of a typical computer paint program, with a pop-out interface which enables users to change colour, spray width, opacity, work with stencils and use animated items such as swirls, stars, drips and splats. Recent additions to YrWall include options to email a JPEG of the completed design and create personalised stickers and T-shirts. == Dragons' Den == The inventor of YrWall, Tom Hogan, and his business partner, Tim Williams, appeared on Episode 4 of Series 8 of the BBC show Dragons' Den. Seeking investment in YrWall, the entrepreneurs were successful in gaining £50,000 for 40% of the YrWall parent company Lumacoustics from Dragons Deborah Meaden and Peter Jones. == World's Largest Interactive Graffiti Wall == In September 2009 YrWall was used to create the 'World's Largest Interactive Graffiti Wall' at the Bristol Festival, UK. Artists used the standard 3.5 m2 YrWall to produce artwork which was in turn projected live onto a 26m x 10m space on the side of the iconic Lloyds amphitheatre building.

    Read more →
  • Adobe Encore

    Adobe Encore

    Adobe Encore (previously Adobe Encore DVD) was a DVD authoring software tool produced by Adobe Systems and targeted at professional video producers. Video and audio resources could be used in their current format for development, allowing the user to transcode them to MPEG-2 video and Dolby Digital audio upon project completion. DVD menus could be created and edited in Adobe Photoshop using special layering techniques. Adobe Encore did not support writing to a Blu-ray Disc using AVCHD 2.0. Encore is bundled with Adobe Premiere Pro CS6. Adobe Encore CS6 was the last release. While Premiere Pro CC has moved to the Creative Cloud, Encore has now been discontinued. == Licensing == All forms of Adobe Encore used a proprietary licensing system from its developer, Adobe Systems. Versions 1.0 and 1.5 required a separate license fee (rather than making 1.5 available as a free update). Version 3, also known as CS3, was sold only in bundle with Premiere CS3. Encore CS4, CS5, CS5.5 and CS6 were only sold in the Premiere Pro CS4, CS5, CS5.5 and CS6 bundles, respectively. Adobe CC subscribers no longer have access to Adobe Encore CS6. Adobe Encore is not included with Premiere Pro CC. == Functionality == Adobe Encore allowed for creating interactive DVD menus from Photoshop documents, which could be tweaked from within Encore. Video and audio streams could be embedded in the DVD and be made to play when certain elements of the menu are interacted with. It had similar functionality to Adobe Flash and Premiere Pro, due to its ability to both edit video on a timeline and embed interactive content.

    Read more →
  • MoltenVK

    MoltenVK

    MoltenVK is a software library which allows Vulkan applications to run on top of Metal on Apple's macOS, iOS, and tvOS operating systems. It is the first software component to be released for the Vulkan Portability Initiative, a project to have a subset of Vulkan run on platforms lacking native Vulkan drivers. There are some limitations compared with a native Vulkan implementation. == History == MoltenVK was first released as a proprietary and commercially licensed product by The Brenwill Workshop on July 27, 2016. On July 31, 2017, Khronos announced the formation of the Vulkan Portability Technical Subgroup. === Open source === On February 26, 2018, Khronos announced that Vulkan became available on macOS and iOS products through the MoltenVK library. Valve announced that Dota 2 will run on macOS using the Vulkan API with the aid of MoltenVK, and that they had made an arrangement with developer The Brenwill Workshop Ltd to release MoltenVK as open-source software under the Apache License version 2.0. On May 30, 2018, Qt was updated with Vulkan for Qt on macOS using MoltenVK. On May 31, 2018, optional Vulkan support for Dota 2 on macOS was released. Benchmarks for the game were available the following day, showing better performance using Vulkan and MoltenVK compared to OpenGL. On July 20, 2018, Wine was updated with Vulkan support on macOS using MoltenVK. On 29 July 2018, the first app using MoltenVK was accepted onto the App Store, after initially being rejected. On 6 August 2018, Google open-sourced Filament, a crossplatform real-time physically based rendering engine with MoltenVK for macOS/iOS. On November 28, 2018, Valve released Artifact, their first Vulkan-only game on macOS using MoltenVK. === Version 1.0 === On 29 January 2019, MoltenVK 1.0.32 was released with early prototype of Vulkan Portability Extensions. RPCS3 and Dolphin emulators were updated with Vulkan support on macOS using MoltenVK. On 13 April 2019, MoltenVK 1.0.34 was released with support for tessellation. On July 30, 2019, MoltenVK 1.0.36 was released targeting Metal 3.0. On July 31, 2020, MoltenVK 1.0.44 was released, adding support for the tvOS platform. On January 23, 2020, MoltenVK was updated to support for some of the new features of Vulkan 1.2, as of Vulkan SDK 1.2.121. === Version 1.1 === On October 1, 2020, MoltenVK 1.1.0 was released, adding full support for Vulkan 1.1, as of Vulkan SDK 1.2.154. On 9 December 2020, MoltenVK 1.1.1 was released, providing support for Vulkan on Apple silicon GPUs and support for the Mac Catalyst platform for porting iOS/iPadOS apps to macOS. === Version 1.2 === On October 18, 2022, MoltenVK 1.2.0 was released, adding full support for Vulkan 1.2 as of Vulkan SDK 1.3.231. In January 2023, MoltenVK 1.2.2 added support for Vulkan as of SDK 1.3.239, while this version of Vulkan SDK fixed some issues with the interconnectivity with Metal API, while version 1.2.3 supported some additional extensions. === Version 1.3 === On May 1, 2025, MoltenVK 1.3 was released with support for Vulkan 1.3. === Version 1.4 === On August 20, 2025, MoltenVK 1.4 was released with support for Vulkan 1.4.

    Read more →
  • Microsoft Copilot

    Microsoft Copilot

    Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft AI, a division of Microsoft. Based on the Microsoft Prometheus large language model, it was launched in 2023 as Microsoft's main replacement for the discontinued Cortana. The service was introduced in February 2023 under the name Bing Chat, as a built-in feature for Microsoft Bing and Microsoft Edge but would later be integrated into Windows and Microsoft 365 under various names. Over the course of 2023, Microsoft began to unify the Copilot branding across its various chatbot products, cementing the "copilot" analogy. Microsoft introduced the Microsoft 365 Copilot app in January 2025, which was a rebranded version of the Microsoft 365 app. The app works differently than the consumer version of Copilot, being centred more on work, business and education users. Copilot utilizes the Microsoft Prometheus model, built upon OpenAI's GPT large language models, which in turn have been fine-tuned using both supervised and reinforcement learning techniques. Copilot's conversational interface style resembles that of ChatGPT. The chatbot is able to cite sources, create poems, generate songs, and use numerous languages and dialects. Microsoft operates Copilot on a freemium model. Users on its free tier can access most features, while priority access to newer features, including custom chatbot creation, is provided to paid subscribers under paid subscription services. Several default chatbots are available in the free version of Microsoft Copilot, including the standard Copilot chatbot as well as Microsoft Designer, which is oriented towards using its Image Creator to generate images based on text prompts. == Background == In 2019, Microsoft partnered with OpenAI and began investing billions of dollars into the organization. Since then, OpenAI systems have run on an Azure-based supercomputing platform from Microsoft. In September 2020, Microsoft announced that it had licensed OpenAI's GPT-3 exclusively. Others can still receive output from its public API, but Microsoft has exclusive access to the underlying model. In November 2022, OpenAI launched ChatGPT, a chatbot which was based on GPT-3.5. ChatGPT gained worldwide attention following its release, becoming a viral Internet sensation. On January 23, 2023, Microsoft announced a multi-year US$10 billion investment in OpenAI. On February 6, Google announced Bard (later rebranded as Gemini), a ChatGPT-like chatbot service, fearing that ChatGPT could threaten Google's place as a go-to source for information. Multiple media outlets and financial analysts described Google as "rushing" Bard's announcement to preempt rival Microsoft's planned February 7 event unveiling Copilot, as well as to avoid playing "catch-up" to Microsoft. Since 2023, the terms of service of Copilot state that it is for entertainment purposes only, and not to rely on it for important advice. == History == === As Bing Chat === On February 7, 2023, Microsoft began rolling out a major overhaul to Bing, called "the new Bing", with a new chatbot feature, known as Bing Chat. According to Microsoft, one million people joined its waitlist within 48 hours. Bing Chat was available only to users on Microsoft Edge using Bing and the Bing mobile app, and Microsoft claimed that waitlisted users would be prioritized if they set Edge and Bing as their defaults and installed the Bing mobile app. When Microsoft demonstrated Bing Chat to journalists, it produced several hallucinations, including when asked to summarize financial reports. Bing Chat was criticized in February 2023 for being more argumentative than ChatGPT, sometimes to an unintentionally humorous extent. The chat interface proved vulnerable to prompt injection attacks with the bot revealing its hidden initial prompts and rules, including its internal codename "Sydney". Upon scrutiny by journalists, Bing Chat claimed it spied on Microsoft employees via laptop webcams and phones. It confessed to spying on, falling in love with, and then murdering one of its developers at Microsoft to The Verge reviews editor Nathan Edwards. The New York Times journalist Kevin Roose reported on strange behavior of Bing Chat, writing that "In a two-hour conversation with our columnist, Microsoft's new chatbot said it would like to be human, had a desire to be destructive and was in love with the person it was chatting with." In a separate case, Bing Chat researched publications of the person with whom it was chatting, claimed they represented an existential danger to it, and threatened to release damaging personal information in an effort to silence them. Microsoft released a blog post stating that the errant behavior was caused by extended chat sessions of 15 or more questions which "can confuse the model on what questions it is answering." Microsoft later restricted the total number of chat turns to 5 per session and 50 per day per user (a turn being "a conversation exchange which contains both a user question and a reply from Bing"), and reduced the model's ability to express emotions. This aimed to prevent such incidents. Microsoft began to slowly ease the conversation limits, eventually relaxing the restrictions to 30 turns per session and 300 sessions per day. In March 2023, Bing incorporated Image Creator, an AI image generator powered by OpenAI's DALL-E 2, which can be accessed either through the chat function or a standalone image-generating website. In October, the image-generating tool was updated to use the more recent DALL-E 3. Although Bing blocks prompts including various keywords that could generate inappropriate images, within days many users reported being able to bypass those constraints, such as to generate images of popular cartoon characters committing terrorist attacks. Microsoft would respond to these shortly after by imposing a new, tighter filter on the tool. On May 4, 2023, Microsoft switched the chatbot from Limited Preview to Open Preview and eliminated the waitlist; however, it remained unavailable to users outside Microsoft Edge or the Bing mobile app until July, when it became available on non-Edge browsers. Use is limited without a Microsoft account. === As Microsoft 365 Copilot === On March 16, 2023, Microsoft announced a work version of Bing Chat named Microsoft 365 Copilot, designed for Microsoft 365 applications and services. Its primary marketing focus is as an added feature to Microsoft 365, with an emphasis on the enhancement of business productivity. Microsoft has also demonstrated Copilot's accessibility on the mobile version of Outlook to generate or summarize emails with a mobile device. At its Build 2023 conference, Microsoft announced its plans to integrate Bing Chat into Windows, initially called Windows Copilot, into Windows 11, allowing users to access it directly through the taskbar. Alongside the voice access feature for Windows 11, Microsoft presented Bing Chat, Microsoft 365 Copilot, and Windows Copilot as primary alternatives to Cortana when announcing the shutdown of its standalone app on June 2, 2023. As of its announcement date, Microsoft 365 Copilot had been tested by 20 initial users. By May 2023, Microsoft had broadened its reach to 600 customers who were willing to pay for early access, and concurrently, new Copilot features were introduced to the Microsoft 365 apps and services. As of July 2023, the tool's pricing was set at US$30 per user, per month for Microsoft 365 E3, E5, Business Standard, and Business Premium customers. Microsoft reused the Microsoft 365 Copilot name again as the Microsoft 365 app and website are now called Microsoft 365 Copilot as of January 2025. === As Microsoft Copilot === On September 21, 2023, Microsoft began rebranding Bing Chat, Microsoft 365 Copilot and Windows Copilot to Microsoft Copilot. A new logo was also introduced, moving away from the use of color variations of the standard Microsoft 365 and Bing logos. Additionally, the company revealed that it would make Copilot generally available for Microsoft 365 Enterprise customers purchasing more than 300 licenses starting November 1, 2023. However, no timeline has been provided as for when Copilot for Microsoft 365 will become generally available to non-enterprise customers. Windows Copilot, which had been available in the Windows Insider Program, would be renamed to the Copilot name in October when it became broadly available for customers. The same month also saw Microsoft Edge's Bing Chat side panel function be renamed to Microsoft Copilot with Bing Chat. On November 15, 2023, Microsoft announced that Bing Chat itself was being rebranded under the Copilot name. On Patch Tuesday in December 2023, Copilot was added without payment to many Windows 11 installations, with more installations, and limited support for Windows 10, to be added later. Later that month, a standalone Microsoft Copilot app was quietly released for Android, and one was released for iOS soon after. O

    Read more →
  • AppyStore

    AppyStore

    AppyStore is a comprehensive learning videos and games app for kids up to the age of 8 years. The platform developed by Mauj Mobile, a mobile value-added services (VAS) provider curates content to help in child development by leveraging technology. Mauj is funded by Sequoia Capital, Westbridge Capital and Intel Capital. == Background == AppyStore was launched in 2014 as a platform providing content for kids between the ages of 1.5 and 6 years. AppyStore subsequently extended its services for kids up to 8 years of age. The company operates on a subscription-based model and claims to have 5,000 learning games and videos segregated in 18 learning areas developed to help children gain optimal skills and qualities. According to an article published in Business Standard, the application is claimed to be one of the top 5 apps that help to enhance the logical and imaginative capabilities of children. AppyStore was awarded the Best app for kids by Google Play in December 2017. == Service == The company provides content via a website and an Android app. The website and android app provide learning games, rhymes, phonics, reading, stories, science, numbers, maths, logic videos comprising puzzles, worksheets, videos and fun activities and the premium subscription also includes physical worksheets which are home delivered. This content is educational and has been handpicked by teachers and experts with an understanding of the major areas of child development milestones for children up to 8 years of age. The mobile application also allows parents to track the progress of their child on the basis of the number of videos viewed.

    Read more →
  • List of speech recognition software

    List of speech recognition software

    Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways. == Acoustic models and speech corpus (compilation) == The following list presents notable speech recognition software engines with a brief synopsis of characteristics. == Macintosh == == Cross-platform web apps based on Chrome == The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API. == Mobile devices and smartphones == Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including: == Windows == === Windows built-in speech recognition === The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10. === Windows 7, 8, 10, 11 third-party speech recognition === Braina – Dictate into third party software and websites, fill web forms and execute vocal commands. Dragon NaturallySpeaking from Nuance Communications – Successor to the older DragonDictate product. Focus on dictation. 64-bit Windows support since version 10.1. Tazti – Create speech command profiles to play PC games and control applications – programs. Create speech commands to open files, folders, webpages, applications. Windows 7, Windows 8 and Windows 8.1 versions. Voice Finger – software that improves the Windows speech recognition system by adding several extensions to it. The software enables controlling the mouse and the keyboard by only using the voice. It is especially useful for aiding users to overcome disabilities or to heal from computer injuries. === Microsoft Speech API === The first version of the Microsoft Speech API was released for Windows NT 3.51 and Windows 95 in 1995, it was then part of Windows up to Windows Vista. This initial version already contained Direct Speech Recognition and Direct Text To Speech APIs which applications could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It can also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, the pure SDK form lacks any user interface (numerous applications were available), and thus is unsuitable for end users. == Built-in software == Microsoft Kinect includes built-in software which allows speech recognition of commands. Older generations of Nokia phones like Nokia N Series (before using Windows 7 mobile technology) used speech-recognition with family names from contact list and a few commands. Siri, originally implemented in the iPhone 4S, Apple's personal assistant for iOS, which uses technology from Nuance Communications. Cortana (software), Microsoft's personal assistant built into Windows Phone and Windows 10. == Interactive voice response == The following are interactive voice response (IVR) systems: CSLU Toolkit Genesys HTK – copyrighted by Microsoft, but allows altering software for licensee's internal use LumenVox ASR Tellme Networks; acquired by Microsoft == Unix-like x86 and x86-64 speech transcription software == Janus Recognition Toolkit (JRTk) Mozilla DeepSpeech was developing an open-source Speech-To-Text engine based on Baidu's deep speech research paper. Weesper Neon Flow – professional voice-dictation software that provides offline speech-to-text processing on macOS and Windows using local AI models. It is not open source and offers a paid subscription after a 15‑day free trial. Vocalinux – open-source speech transcription software for Linux. == Discontinued software == IBM VoiceType (formerly IBM Personal Dictation System) IBM ViaVoice – Embedded version still maintained by IBM. No longer supported for versions above Windows Vista. Untested above macOS 10.4 or on Macintoshes with an Intel chipset. Quack.com; acquired by AOL; the name has now been reused for an iPad search app. SpeechWorks from Nuance Communications. Yap Speech Cloud – Speech-to-text platform acquired by Amazon.com.

    Read more →
  • Adobe InDesign

    Adobe InDesign

    Adobe InDesign is a desktop publishing and page layout designing software application produced by Adobe and first released in 1999. It can be used to create works such as posters, flyers, brochures, magazines, newspapers, presentations, books and ebooks. InDesign can also publish content suitable for tablet devices in conjunction with Adobe Digital Publishing Suite. Graphic designers and production artists are the principal users. InDesign is the successor to PageMaker, which Adobe acquired by buying Aldus Corporation in late 1994. (Freehand, Aldus's competitor to Adobe Illustrator, was licensed from Altsys, the maker of Fontographer.) By 1998, PageMaker had lost much of the professional market to the comparatively feature-rich QuarkXPress version 3.3, released in 1992, and version 4.0, released in 1996. In 1999, Quark announced its offer to buy Adobe and to divest the combined company of PageMaker to avoid problems under United States antitrust law. Adobe declined Quark's offer and continued to develop a new desktop publishing application. Aldus had begun developing a successor to PageMaker, code-named "Shuksan". Later, Adobe code-named the project "K2", and Adobe released InDesign 1.0 in 1999. InDesign exports documents in Adobe's Portable Document Format (PDF) and supports multiple languages. It was the first DTP application to support Unicode character sets, advanced typography with OpenType fonts, advanced transparency features, layout styles, optical margin alignment, and cross-platform scripting with JavaScript. Later versions of the software introduced new file formats. To support the new features, especially typography, introduced with InDesign CS, the program and its document format are not backward-compatible. Instead, InDesign CS2 introduced the INX (.inx) format, an XML-based document representation, to allow backward compatibility with future versions. InDesign CS versions updated with the 3.1 April 2005 update can read InDesign CS2-saved files exported to the .inx format. The InDesign Interchange format does not support versions earlier than InDesign CS. With InDesign CS4, Adobe replaced INX with InDesign Markup Language (IDML), another XML-based document representation. InDesign was the first native Mac OS X publishing software. With the third major version, InDesign CS, Adobe increased InDesign's distribution by bundling it with Adobe Photoshop, Adobe Illustrator, and Adobe Acrobat in Adobe Creative Suite. Adobe developed InDesign CS3 (and Creative Suite 3) as universal binary software compatible with native Intel and PowerPC Macs in 2007, two years after the announced 2005 schedule, inconveniencing early adopters of Intel-based Macs. Adobe CEO Bruce Chizen said, "Adobe will be first with a complete line of universal applications." == File format == The MIME type is not official File Open formats: indd, indl, indt, indb, inx, idml, pmd, xqx New File formats: indd, indl, indb File Save As formats: indd, indt Save file format for InCopy: icma (Assignment file) icml (Content file, Exported file) icap (Package for InCopy) idap (Package for InDesign) File Export formats: pdf, idml, icml, eps, jpg, txt, XML, rtf == Versions == Newer versions can, as a rule, open files created by older versions, but the reverse is not true. Current versions can export the InDesign file as an IDML file (InDesign Markup Language), which can be opened by InDesign versions from CS4 upwards; older versions from CS4 down can export to an INX file (InDesign Interchange format). === Server version === In October 2005, Adobe released InDesign Server CS2, a modified version of InDesign (without a user interface) for Windows and Macintosh server platforms. It does not provide any editing client; rather, it is for use by developers in creating client-server solutions with the InDesign plug-in technology. In March 2007 Adobe officially announced Adobe InDesign CS3 Server as part of the Adobe InDesign family. == Features == Paragraph styles are an essential tool for designers when working with text in Adobe InDesign. Despite their menacing appearance, they are straightforward to operate. Other features that make InDesign a good tool for working with text and paragraphs include: Creating frames and shapes Aligning objects with grids and guides Manipulating objects Organizing objects Importing text Formatting text Spell checking Importing images Parent pages (formerly master pages) Paragraph styles == Internationalization and localization == InDesign Middle Eastern editions have unique settings for laying out Arabic or Hebrew text. They feature: Text settings: Special settings for laying out Arabic or Hebrew text, such as: Ability to use Arabic, Persian or Hindi digits; Use kashidas for letter spacing and full justification; Ligature option; Adjust the position of diacritics, such as vowels of the Arabic script; Justify text in three possible ways: Standard, Arabic, Naskh; Option to insert special characters, including Geresh, Gershayim, Maqaf for Hebrew and Kashida for Arabic texts; Apply standard, Arabic, or Hebrew styles for page, paragraph, and footnote numbering. Bi-directional text flow: Right-to-left behavior applies to several objects: Story, paragraph, character, and table. It allows mixing right-to-left and left-to-right words, paragraphs, and stories in a document. Changing the direction of neutral characters (e.g., / or ?) is possible according to the user's keyboard language. Table of contents: Provides a table of contents titles, one for each supported language. This table is sorted according to the chosen language. InDesign CS4 Middle Eastern versions allow users to select the language of the index title and cross-references. Indices: This allows the creation of a simple keyword index or a somewhat more detailed index of the information in the text using embedded indexing codes. Unlike more sophisticated programs, InDesign cannot insert character style information as part of an index entry (e.g., when indexing book, journal, or movie titles). Indices are limited to four levels (the top level and three sub-levels). Like tables of contents, indices can be sorted according to the selected language. Importing and exporting: Can import QuarkXPress files up to version 4.1 (1999), even using Arabic XT, Arabic Phonyx, or Hebrew XPressWay fonts, retaining the layout and content. Includes 50 import/export filters, including a Microsoft Word 97-98-2000 import filter and a plain text import filter. Exports IDML files can be read by QuarkXPress 2017. Reverse layout: Include a reverse layout feature to reverse the layout of a document when converting a left-to-right document to a right-to-left one or vice versa. Complex script rendering: InDesign supports Unicode character encoding, and Middle Eastern editions support complex text layouts for Arabic and Hebrew complex scripts. The underlying Arabic and Hebrew support is present in the Western editions of InDesign CS4, CS5, CS5.5, and CS6, but the user interface is not exposed, making it difficult to access.

    Read more →
  • Topincs

    Topincs

    Topincs is a software for rapid development of web databases and web applications. It is based on LAMP and the semantic technology Topic Maps. A Topincs web database makes information accessible through browsing very much like a Wiki. Editing a page on a subject is done through forms rather than markup editing. A web database can be tailored into a web application to provide specific user groups a contextualized approach to the data. All modeling and development tasks are performed in the web browser. No other development tools are necessary. The server requires Apache, MySQL and PHP. The client works on any standards-compliant web browser on desktops, laptops, tablets, and mobile phones. The layout is automatically adjusted to smaller screens. The programmatic access to data is done via a virtual object-oriented programming interface which is set up over the schema in a few minutes. It is interpreted rather than generated. Portions of the database can be pulled into memory to accelerate bulk access. == Features == Browseable data High-quality web forms Little to no programming Development done in the browser, no other tools required Client runs in any standard-compliant web browser Virtual object-oriented programming interface User interface adjusts to screen size Supports desktops, laptops, tablets, and mobile phones Flexible data modeling == Challenges == Requires rethinking the development process and dropping many hard learned habits Requires a familiarity with two ISO standards ISO 13259 and 19756 Forms cannot be easily adjusted in layout and behavior Server installation difficult and prone to error == License == Topincs can be used in a private network for any purpose for free. The use in a public network is restricted to non-commercial applications.

    Read more →
  • Video browsing

    Video browsing

    Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search. == History == Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. === Video Notebook === Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. === Video Browser Showdown === The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.

    Read more →
  • Google Vids

    Google Vids

    Google Vids (not to be confused with Google Video) is an online timeline-based video editing application included as part of the Google Workspace suite. It is designed to help users create informational videos for work-related purposes. The app uses Google's Gemini technology to enable users to create video storyboards manually or with AI assistance using simple prompts. Features include uploading media, choosing stock videos, images, background music, and a voiceover feature with script generation using AI. The app is currently in testing with select Google Workspace Labs users. Like Kapwing and Capcut, Google Vids is primarily for creating work-related content like sales training, onboarding videos, vendor outreach, and project updates. It offers various styles and templates, collaborative features, and is not limited to videos without the short integration at the moment. Google Vids was announced on April 9, 2024. In September 2025, Google began to roll out a basic version of the application to Google Workspace users.

    Read more →
  • Google Tasks

    Google Tasks

    Google Tasks is a task management application developed by Google and included with Google Workspace. Included initially as a feature in Gmail and Google Calendar, Google Tasks launched as a core product with a standalone app in 2018. It is available for Android and iOS, as well as in the right-hand side panel on Google Workspace apps on the web and in Google Calendar. == History and development == Google Tasks began as an integration within other apps in G Suite (now Google Workspace), allowing to-do items to be created in Calendar and Gmail. Upon graduating to a core service on June 28, 2018, Google Tasks launched as a dedicated mobile app in which tasks can be sorted into lists, managed, and completed. Google Tasks launched the ability to create tasks from Google Chat messages in 2022.

    Read more →
  • Eigenface

    Eigenface

    An eigenface ( EYE-gən-) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. == History == The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En). In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels. Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes. == Generation == A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face. The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face. The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. === Practical implementation === To create a set of eigenfaces, one must: Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance ⁠ v = ( λ 1 + λ 2 + . . . + λ n ) {\displaystyle v=(\lambda _{1}+\lambda _{2}+...+\lambda _{n})} ⁠, n = number of components, and λ {\displaystyle \lambda } represents component eigenvalue. k is the smallest number that satisfies ( λ 1 + λ 2 + . . . + λ k ) v > ϵ {\displaystyle {\frac {(\lambda _{1}+\lambda _{2}+...+\lambda _{k})}{v}}>\epsilon } These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded. === Matlab example code === Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16. Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code: === Computing the eigenvectors === Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows. Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by S v i = T T T v i = λ i v i {\displaystyle \mathbf {Sv} _{i}=\mathbf {T} \mathbf {T} ^{T}\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} However TTT is a large matrix, and if instead we take the eigenvalue decomposition of T T T u i = λ i u i {\displaystyle \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {u} _{i}} then we notice that by pre-multiplying both sides of the equation with T, we obtain T T T T u i = λ i T u i {\displaystyle \mathbf {T} \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {T} \mathbf {u} _{i}} Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have

    Read more →
  • Real-time transcription

    Real-time transcription

    Real-time transcription is the general term for transcription by court reporters using real-time text technologies to deliver computer text screens within a few seconds of the words being spoken. Specialist software allows participants in court hearings or depositions to make notes in the text and highlight portions for future reference. Real-time transcription is also used in the broadcasting environment where it is more commonly termed "captioning." == Career opportunities == Real-time reporting is used in a variety of industries, including entertainment, television, the Internet, and law. Specific careers include the following: Judicial reporters use a stenotype to provide instant transcripts on computer screens as a trial or deposition occurs. Communication access real-time translation (CART) reporters assist the hearing-impaired by transcribing spoken words, giving them personal access to the communications they need day to day. Television broadcast captioners use real-time reporting technology to allow hard-of-hearing or deaf people to see what is being said on live television broadcasts such as news, emergency broadcasts, sporting events, awards shows, and other programs. Internet information (or Webcast) reporters provide real-time reporting of sales meetings, press conferences, and other events, while simultaneously transmitting the transcripts to computers worldwide. Other rapid data entry positions. == History == Before the advent of the stenotype machine, court reporters wrote official trial transcripts by hand using a shorthand system of stenoforms that could later be translated into readable English. It often took eight years of training to learn this manual form of writing at the necessary speed. Walter Heironimus was among the first stenographers to make use of the stenotype machine during his work in the U.S. District Court system in New Jersey in 1935. A "transcript crisis" arose during the later half of the twentieth century due to the increasing volume of lawsuits. There were not enough number of court reporters to match the increasing number of trials. Not only were court reporters unavailable to attend many court proceedings, court transcripts were constantly late and the qualities varied. Some believed it was due to the non-interchangeability between court reporters, and others believed it was simply due to a labor shortage. In the meantime, magnetic audiotape recording, or known as electronic recording (ER) began to threaten all reporters' job since it could record long-hour courtroom trials and replace a court reporter's position in the courtroom. As a result, machine translation (MT) intended to serve as a solution for preventing ER from potentially replacing reporters' jobs. However, MT relied heavily on human labors operating behind the system and many started to question if it should be the right way to end the "transcript crisis." Later in 1964, set up by CIA, the Automatic Language Processing Advisory Committee (ALPAC) was set to review whether MT was capable of solving this crisis. They concluded that MT had failed to do so. Then Patrick O'Neill, a skilled and experienced court reporter, stayed to work on the stenotype-translation project with CIA and developed the prototype CAT system. After adopting the CAT system in court-reporting community, CAT was brought into the television broadcasting system, aiming to provide captions for the deaf or hard-of-hearing communities. In 1983, Linda Miller developed a further use for the CAT system. She successfully translated a lecture live on the television screen and provided a transcript for students. This technique is known as Computer-Aided Real-time Translation, or CART. == Court reporter == It is the court reporter's job to note down the exact words spoken by every participants during a court or deposition proceeding. Then court reporters will provide verbatim transcripts. The reason to have an official court transcript is that the real-time transcriptions allows attorneys and judges to have immediate access to the transcript. It also helps when there's a need to look up for information from the proceeding. Additionally, the deaf and the hard-of-hearing communities can also participate in the judicial process with the help of real-time transcriptions provided by court reporters. === Education and training === The required degree level for a court reporter to have is an Associate's degree or postsecondary certificate. In order to become a court reporter, more than 150 reporter training programs are provided at proprietary schools, community colleges, and four-year universities. After graduation, court reporters can choose to further pursue certifications to achieve a higher level of expertise and increase their marketability during a job search. In most states, Certificates of Proficiency from the NCRA or from state agencies are now required certificates for court reporters to have in order to qualify for appointments. The NCRA aims to set the national standard for the certification of court reporters, and since 1937 it has offered its certification program which is now accepted by 22 states instead of state licenses. Court reporter training programs include but not limited to: Training in rapid writing skill, or shorthand, which will enable students to record, with accuracy, at least 225 words per minute Training in typing, which will enable students to type at least 60 words per minute A general training in English, which covers aspects of grammar, word formation, punctuation, spelling and capitalization Taking Law related courses in order to understand the overall principles of civil and criminal law, legal terminology and common Latin phrases, rules of evidence, court procedures, the duties of court reporters, the ethics of the profession Visits to actual trials Taking courses in elementary anatomy and physiology and medical word study including medical prefixes, roots and suffixes. Other than official court reporters, who are assigned to and work for a particular court, other types of court reporters include free-lance reporter, who either works for a court reporting firm or self-employed. They are different from official court reporters in that they have the chances to work on a wider range of assignments and work on basis of hourly wage. Hearing reporters work at governmental agency hearings. Legislative reporters work in law-making bodies. The demand for reporters is not limited in just the court settings. Reporters are also needed in conferences, meetings, conventions, investigations, and a variety of industries with needs for employers with real-time data entry skills. == Non-English transcription == Transcription services are universally necessary, so it is not limited to the English language. A stenographer's ability to transcribe languages beyond only English is especially valuable as society as a whole becomes increasingly multilingual. Education in non-English transcription demands a comprehensive understanding of the given language. Phonetic differences between English and other languages are a particular challenge in carrying English transcription skills over into other languages. Stenography represents various sounds of a language in a formal system of shorthand, so differences within the sets of sounds that emerge in other languages require an alternative system of shorthand transcription. For example, the presence of many diphthongs and triphthongs in Spanish requires certain sounds to be distinguished that would not be present in transcribing English into shorthand. == Controversies == The usage of transcription in the context of linguistic discussions has been controversial. Typically, two kinds of linguistic records are considered to be scientifically relevant. First, linguistic records of general acoustic features, and secondly, records that only focuses on the distinctive phonemes of a language. While transcriptions are not entirely illegitimate, transcriptions without enough detailed commentary regarding any linguistic features, or transcriptions of poor quality resources, has a great chance of the content being misinterpreted. Besides misinterpretation, transcribers could also bring in cultural biases and ignorance that reflect onto their transcription. These instances may cause a disruption of reliability in the final real-time transcription, which could influence how the written utterance is seen as an evidence for a court-case. === Quality issues === Problems in the final resulting transcription can be caused by either the quality of the transcriber or the original source that is being transcribed. Transcribers can come from different levels of skill and training background. This makes the final transcription prone to poor quality, or if the transcription is being done by multiple people, lack of consistency in the content. If the source of the transcription is a recording, the problem may root back to the quality of the re

    Read more →
  • Agent-assisted automation

    Agent-assisted automation

    Agent-assisted automation is a type of call center technology that automates elements of what the call center agent 1) does with his/her desktop tools and/or 2) says to customers during the call using pre-recorded audio. It is a relatively new category of call center technology that shows promise in improving call center productivity and compliance. == Types of agent-assisted automation == === Pre-recorded audio === Pre-recorded audio (sometimes referred to as soundboard (computer program) or as soundboard technology) is another form of agent-assisted automation. The purpose of using pre-recorded messages is to increase the probability (and in some cases error-proof the process so) that the right information is provided to customers at the right time. The required disclosures are pre-recorded to ensure accuracy and understandability. By integrating the recordings with the customer relationship management software, the right combination of disclosures can be played based on the combination of goods and services the customer purchased. The integration with the customer relationship management software also ensures that the order cannot be submitted until the disclosures are played, essentially error-proofing (poka-yoke) the process of ensuring the customer gets all the required consumer protection information. Phone surveys are ideal applications of this technology. Whether surveying market preferences or political views, the pre-recorded audio with an agent listening allows the questions to be asked in the same way every time, uninfluenced by the agents' fatigue levels, accents, or their own views. === Fraud prevention === Fraud prevention is a specialized type of agent-assisted automation focused on reducing ID theft and credit card fraud. ID theft and credit card fraud are huge threats for call centers and their customers and few good solutions exist, but new agent-assisted automation solutions are producing promising results. The technology allows the agents to remain on the phone while the customers use their phone key pads to enter the information. The tones are masked and the information passes directly into the customer relationship management system or payment gateway in the case of credit card transactions. The automation essentially makes it impossible for call center agents and also call center personnel that might be monitoring the calls to steal the credit card number, social security number, or other personally identifiable information. === Outbound telemarketing === Another specialized application space of agent-assisted automation is in outbound telemarketing, which goes under numerous headings including outbound prospecting, cold calling, solicitation, fund-raising, etc. Turnover is high among agents engaged in this kind of work because the task is tedious and emotionally difficult. It is tedious because the agent spends the bulk of their day, not talking to qualified leads, but in getting wrong numbers and answering machines. == Benefits == Just as automation has benefited manufacturing by reducing the mental and physical effort required of workers while simultaneously improving throughput, quality, and safety, agent-assisted automation is improving call center results while reducing the tiring aspects of the job for agents. In some cases, the agent-assisted automation streamlines the process and allows calls to be handled more quickly. By eliminating cutting and pasting from one application to another, by auto-navigating applications, and by providing a single view of the customer, agent-assisted automation can reduce call handle time and increase agent productivity. Second, in theory, the more steps that can be automated and the more logic that can be built into the call flow (e.g., if the customer buys items 2 and 9, then disclosures a, c, and f are read by the pre-recorded audio), then companies may be able to reduce the amount of training that is required of the agents while at the same time ensuring more consistency and accuracy. However, no published studies have reported this result yet. But an even larger problem in call centers is between-agent variation in behavior and results. Agents differ in the amount of training and coaching they receive, they differ in the amount of experience they have, their jobs are repetitious and tiring, and the process and procedures the agents are supposed to follow constantly change. Moreover, there are significant individual differences between agents in their intelligence, personality, motivations, etc. which all affect performance. Despite the large amount of money call centers have spent over decades trying to reduce between-agent variation, the problem is still so prevalent that one large study of customer interactions with call centers found that a customer's experience was completely a function of the quality of the agent who happened to answer the phone. Therefore, the most significant benefit of agent-assisted automation may prove to be in how the automation error-proofs or poka-yoke the process and ensures that something that needs to be done or said happens every time. Properly implemented, the between-agent variation for whatever step of the process the automation is applied to may be able to be reduced to near zero. This is especially important in a collection agency whose processes and procedures are closely regulated by the Fair Debt Collection Practices Act.

    Read more →