AI Art Video

AI Art Video — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cloud-to-cloud integration

    Cloud-to-cloud integration

    Cloud-to-Cloud Integration ( C2I ) allows users to connect disparate cloud computing platforms. While Paas (Platform as a service) and Saas (Software as a service) continue to gain momentum, different vendors have different implementations for cloud computing, e.g. Database, REST, SOAP API. Another name for Cloud-to-Cloud Integration is Cloud-Surfing. See also Cloud-based integration

    Read more →
  • Game Jolt

    Game Jolt

    Game Jolt is a social community platform for video games, gamers and content creators. Founded by Yaprak and David DeCarmine, it is available on iOS, Android, and on the web and as a desktop app for Windows and Linux. Users share interactive content through a variety of formats including images, videos, live streams, chat rooms, and virtual events. == Features == === Crowd streaming === In 2021 Game Jolt revealed their own live streaming feature called Firesides. Firesides allowed multiple users to simultaneously livestream together with nearly no delay. The feature launched with a virtual concert showcasing its ability to accommodate multiple streamers. On October 16, 2023, Firesides were removed from Game Jolt. === Mobile app === Game Jolt Social by Game Jolt Inc. launched on both the Apple App Store and Google Play Store in March 2022. "It's clear to us that Gen Z is tired of generic social media and they want a place specifically for gaming that supports all types of content they're creating–art, videos, thoughts, and livestreams all in one place." said Game Jolt founder and CEO Yaprak DeCarmine, in a statement to VentureBeat. === Game API === The Game Jolt Application Programming Interface (usually known as the Game Jolt Game API) allows any developer using a game development platform that supports HTTP operations and MD5 or SHA-1. Game Jolt advertises that the API can: Create multiple "scoreboards" which collect high scores from players made publicly available on the game's profile and give user accounts EXP Award player's trophies which give user accounts EXP Store game data on Game Jolt's data servers Log whether a user is currently playing a game they're logged into via the GJAPI == Game jams and competitions == Game Jolt regularly hosts game jams where participants are encouraged to develop games for a chance to win prizes. They hosted their first game jam in 2009, Shocking Contest. In November 2014, Game Jolt announced the "Indies vs PewDiePie" game jam, partnering with the popular YouTuber Felix "PewDiePie" Kjellberg. Developers were given a weekend (21–24 November) to create a game with the theme of "fun to play, fun to watch" to suit the Let's Plays entertainment style. Users could rate entries afterwards until December 1 when the scores were counted up. The prize to the top 10 rated games was Felix playing the games on his channel as a means of promotion for the developers, although later he played other entries. One of the participants of the jam, now known as Outerminds Inc. was discovered and hired by PewDiePie to develop his mobile game, Legend of the Brofist. Game Jolt partnered with Felix, Sean "Jacksepticeye" McLoughlin and Mark "Markiplier" Fischbach to host "Indies vs Gamers" in July 2015. The requirements for entries were arcade games using the Game Jolt Game API highscore tables, to be made between the July 17–20 and the top 5 games were played on the partner's YouTube channels. Following the "Indies vs PewDiePie" game jam in 2014, Game Jolt released their internal jam hosting tools public for all users to use as a service, to create their own game jams that integrated with the main site. Today, Game Jolt focuses on hosting and co-hosting game competitions with established brands in order to bring monetary and educational opportunities to their users. On April 15, 2024, an announcement was made about a collaboration with Pocket Worlds for the "HighRise Game Jam". Pocket Worlds had sold NFTs up until roughly 2022, causing a community outburst. The situation was addressed, and the situation started to disperse. == Contests == == Events == Game Jolt hosts both physical and virtual events to entertain and prank its users, which consists of the following: == History == Game Jolt has supported independent creators with a central platform to manage their content and communities since its start in 2003. David DeCarmine began development of Game Jolt at the age of 14 for a group of hobbyists, making games and sharing on forums in an early iteration known as Holo World. The original intention was to create a platform for gamers where new games could be discoverable and quickly playable, and where feedback could be provided directly to the creators, allowing them to continue improving their games. In 2008, Game Jolt was registered as an LLC, then incorporated as Game Jolt Inc. in September 2020. A new site launched in 2015 featuring a responsive design, automated curation for both games and game news articles which weighs how recent a game was uploaded and how popular it is ("hot") and filtering options on game listings for platform, maturity rating and development status. In March 2022, Game Jolt launched a mobile application simultaneously on the Google Play Store and Apple App Store targeted at Gen Z gamers and creators. While in beta, the mobile app had 100,000 installs pre-launch. === Game store === Game Jolt continues to host a large library of independent games. Game developers can upload their games directly to the site to share or sell. They would allow distribution for downloadable games, later adding support for Adobe Flash, Unity and Java games which allowed support for browser based games. In February 2013, Game Jolt built support for browser-based HTML5 games as well. A user levelling system was released into public beta in April 2013, incorporating the GJAPI trophies and highscores, as well as site activity, to generate 'EXP' (experience points). Game Jolt Jams released in early 2014 as a service to allow users to create their own game jams that integrated with the main site. In April 2016, an online marketplace was announced and released the following month with an exclusive set of game titles, including Bendy and the Ink Machine, allowing developers to sell their games on the site. In January 2016, Game Jolt released source code of the client and site's front end on GitHub under MIT license. In January 2022, Game Jolt banned adult games from appearing on the site, stating in an email to developers that the site had become a "social media platform" and they "had to make decisions around the direction and future of the brand which has now included the removal of hosted games with explicitly adult content." In response to a tweet by Itch.io saying the site is not for prudes, they wrote in their own tweet: "Game Jolt is a platform with a large audience of 13-16 year olds. Our users asked us to clean up, so here we are." == Investments == After bootstrapping Game Jolt with revenue earned from ads on the website for years, the DeCarmines secured venture capital in 2020 from SoftBank, doing so again in 2021 from founders of Twitch, Rec Room, Modio and more.

    Read more →
  • Motor theory of speech perception

    Motor theory of speech perception

    The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them. The hypothesis has gained more interest outside the field of speech perception than inside. This has increased particularly since the discovery of mirror neurons that link the production and perception of motor movements, including those made by the vocal tract. The theory was initially proposed in the Haskins Laboratories in the 1950s by Alvin Liberman and Franklin S. Cooper, and developed further by Donald Shankweiler, Michael Studdert-Kennedy, Ignatius Mattingly, Carol Fowler and Douglas Whalen. == Origins and development == The hypothesis has its origins in research using pattern playback to create reading machines for the blind that would substitute sounds for orthographic letters. This led to a close examination of how spoken sounds correspond to the acoustic spectrogram of them as a sequence of auditory sounds. This found that successive consonants and vowels overlap in time with one another (a phenomenon known as coarticulation). This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures. === Associationist approach === Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. This aspect of the theory was dropped, however, with the discovery that prelinguistic infants could already detect most of the phonetic contrasts used to separate different speech sounds. === Cognitivist approach === The behavioristic approach was replaced by a cognitivist one in which there was a speech module. The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception. === Changing distal objects === Initially, speech perception was assumed to link to speech objects that were both the invariant movements of speech articulators the invariant motor commands sent to muscles to move the vocal tract articulators This was later revised to include the phonetic gestures rather than motor commands, and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements. === Modern revision === The "speech is special" claim has been dropped, as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception). === Mirror neurons === The discovery of mirror neurons has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, although there are also critics. == Support == === Nonauditory gesture information === If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively heard as "sounds". This is, in fact, the case. The McGurk effect shows that seeing the production of a spoken syllable that differs from an auditory cue synchronized with it affects the perception of the auditory one. In other words, if someone hears "ba" but sees a video of someone pronouncing "ga", what they hear is different—some people believe they hear "da". People find it easier to hear speech in noise if they can see the speaker. People can hear syllables better when their production can be felt haptically. === Categorical perception === Using a speech synthesizer, speech sounds can be varied in place of articulation along a continuum from /bɑ/ to /dɑ/ to /ɡɑ/, or in voice onset time on a continuum from /dɑ/ to /tɑ/ (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being /dɑ/ and the sound on the other extreme being /tɑ/, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either /dɑ/ or /tɑ/. Likewise, the English consonant /d/ may vary in its acoustic details across different phonetic contexts (the /d/ in /du/ does not technically sound the same as the one in /di/, for example), but all /d/'s as perceived by a listener fall within one category (voiced alveolar plosive) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments." This suggests that humans identify speech using categorical perception, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track. === Speech imitation === If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing. People can repeat heard syllables more quickly than they would be able to produce them normally. === Speech production === Hearing speech activates vocal tract muscles, and the motor cortex and premotor cortex. The integration of auditory and visual input in speech perception also involves such areas. Disrupting the premotor cortex disrupts the perception of speech units such as plosives. The activation of the motor areas occurs in terms of the phonemic features which link with the vocal track articulators that create speech gestures. The perception of a speech sound is aided by pre-emptively stimulating the motor representation of the articulators responsible for its pronunciation . Auditory and motor cortical coupling is restricted to a specific range of neuronal firing frequency. === Perception-action meshing === Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neurons that are activated both by seeing (or hearing) an action and when that action is carried out. Another source of evidence is that for common coding theory between the representations used for perception and action. == Criticisms == The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary".p. 361 Several critiques of it exist. === Multiple sources === Speech perception is affected by nonproduction sources of information, such as context. Individual words are hard to understand in isolation but easy when heard in sentence context. It therefore seems that speech perception uses multiple sources that are integrated together in an optimal way. === Production === The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around. It would also predict that defects in speech production would impair speech perception, but they do not. However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists. === Speech module === Several sources of evidence for a specialized speech module have failed to be supported. Duplex perception can be observed with door slams. The McGurk effect can also be achieved with nonlinguistic stimuli, such as showing someone a video of a basketball bouncing but playing the sound of a ping-pong ball bouncing. As for categorical perception, listeners can be sensitive to acoustic differences within single phonetic categories. As a result, this part of the theory has been dropped by some researchers. === Sublexical tasks === The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process spee

    Read more →
  • DeepRoute.ai

    DeepRoute.ai

    DeepRoute.ai (Chinese: 元戎启行) is a Chinese autonomous driving company founded in 2019 and headquartered in Shenzhen, China. The company develops full-stack self-driving solutions including perception, decision-making, and control systems. == History == DeepRoute.ai was founded in February 2019 in Shenzhen, China, by Zhou Guang (周光), who serves as the company's CEO. In September 2019, the company collaborated with Dongfeng for a live-streamed autonomous driving demonstration. In October 2019, during the 7th Military World Games, DeepRoute.ai conducted Robotaxi demonstration operations. In November 2019, it obtained an intelligent connected vehicle road test permit for public roads in Shenzhen. In October 2020, DeepRoute.ai signed an "Autonomous Driving Leadership Project" with Dongfeng to build one of China's largest autonomous fleets. In August 2020, DeepRoute.ai announced its partnership with Cao Cao Mobility, a Geely-backed ride-hailing company, to test Robotaxis in Hangzhou for daily operations, planning to provide Robotaxis during the 2022 Asian Games. In September 2021, DeepRoute.ai secured US$300 million in a Series B funding round led by Alibaba. In December 2021, the company unveiled its DeepRoute-Driver 2.0, an L4-level autonomous driving solution comprising five solid-state lidar sensors, eight cameras, a proprietary computing system and an optional millimeter-wave radar. with a production cost of under US$10,000. In June 2022, it partnered with Deppon Express to provide autonomous light truck freight transfer services. In March 2023, the company launched its high-precision map-free intelligent driving solution, DeepRoute-Driver 3.0. In November 2024, Great Wall Motor announced a $100 million Series C funding round for Deeproute. With this, Deeproute has completed five rounds of financing, raising a cumulative total of over $500 million. Its shareholders include Fosun RZ Capital, Yunqi Partners, Alibaba, Vision Plus Capital, and Dongfeng, among others. In the same month, Deeproute.ai emphasised that they were in "deep cooperation" with Nvidia and spoke on being part of the first batch of companies in China to get a hold of Nvidia's newer Thor chip for cars which will be used in a new system released next year. This new system will help manage more complex driving scenarios through visual cues. == Products == === VLA Model === VLA Model is a Vision–language–action model designed for autonomous driving systems. It integrates visual perception, semantic understanding, and action decision-making into a unified framework, aiming to enhance the safety and adaptability of advanced driver-assistance systems (ADAS) in complex road environments. The model was officially launched on August 26, 2025, as the core of DeepRoute.ai's DeepRoute IO 2.0 platform. The VLA model is characterized by its "visual-language-action" architecture, which incorporates a chain-of-thought (CoT) reasoning capability inspired by large language models. This design is intended to address the "black box" limitations of traditional end-to-end autonomous driving systems by enabling the model to analyze information, infer causality, and make decisions in a more transparent and interpretable manner. === Appliance === The company has partnered with several automakers including Dongfeng Motor Corporation and Geely to develop and test autonomous vehicles.

    Read more →
  • Packed pixel

    Packed pixel

    In packed pixel or chunky framebuffer organization, the bits defining each pixel are clustered and stored consecutively. For example, if there are 16 bits per pixel, each pixel is represented in two consecutive (contiguous) 8-bit bytes in the framebuffer. If there are 4 bits per pixel, each framebuffer byte defines two pixels, one in each nibble. The latter example is as opposed to storing a single 4-bit pixel in a byte, leaving 4 bits of the byte unused. If a pixel has more than one channel, the channels are interleaved when using packed pixel organization. Packed pixel displays were common on early microcomputer system that shared a single main memory for both the central processing unit (CPU) and display driver. In such systems, memory was normally accessed a byte at a time, so by packing the pixels, the display system could read out several pixels worth of data in a single read operation. Packed pixel is one of two major ways to organize graphics data in memory, the other being planar organization, where each pixel is made of individual bits stored in their own plane. For a 4-bit color value, memory would be organized as four screen-sized planes of one bit each and a single pixel's value built up by selecting the appropriate bit from each plane. Planar organization has the advantage that the data can be accessed in parallel, and is used when memory bandwidth is an issue.

    Read more →
  • Corel VideoStudio

    Corel VideoStudio

    Corel VideoStudio (formerly Ulead VideoStudio) is a video editing software package for Microsoft Windows. == Features == === Basic editing === The software allows storyboard and timeline-oriented editing. Various formats are supported for source clips, and the resulting video can be exported to a video file. DVD and AVCHD DVD authoring capabilities are included, and Blu-ray authoring is available via a plug-in. VideoStudio supports direct DV and HDV capture and burning. === Overlay === Users can overlay videos, images, and text. Using the overlay track, up to 50 clips can be displayed simultaneously. It can handle videos in MOV and AVI formats, including alpha channel, and images in PSP, PSD, PNG, and GIF formats. Clips that do not contain an alpha channel can have specific colours removed from the overlay video so that the required background or image is displayed in the foreground. === Proxy video files === VideoStudio supports high-definition video. Proxy files are smaller versions of the video source that stand in for the full-resolution source during editing to improve performance. === Plug-ins/bundles === VideoStudio supports VFX-type plug-ins from providers, including NewBlue and proDAD. proDAD plug-ins Roto-Pen, Script, Vitascene, and Mercalli-Stabilizer are bundled with X4 and later Ultimate Editions. == Version history == Ulead VideoStudio 4 (1999) Ulead VideoStudio 5 (2001) Ulead VideoStudio 6 (2002) Ulead VideoStudio 7 (2003) Ulead VideoStudio 8 (2004) Ulead VideoStudio 9 (2005) Ulead VideoStudio 10 plus. (2006) Corel Ulead VideoStudio 11 plus. (2007) Corel VideoStudio Pro X2 (v12, 2008) Corel VideoStudio Pro X3 (v13, 2010) 2011: Corel VideoStudio Pro X4 (v14, 2011) Adds support for stop motion animation, time-lapse mode photography, 3D movies, and 2nd generation Intel Core. Corel VideoStudio Pro X5 (v15, March 9, 2012): Adds HTML5 export (Comparison of HTML5 and Flash). Corel VideoStudio Pro X6 (v16, April 25, 2013): Windows 8 compatible. Adds UHD 4K support. Corel VideoStudio Pro X7 (v17, March 5, 2014): Software becomes 64-bit. Corel VideoStudio Pro X8 (v18, May 8, 2015): Several improvements. Corel VideoStudio Pro X9 (v19, February 16, 2016): Windows 10 compatible. Adds H.265 support, Multi-Camera Editor, and Match moving. Corel VideoStudio Pro X10 (v20, February 15, 2017): Adds Mask Creator, Track Transparency, and 360-degree video support. Corel VideoStudio Pro 2018 (v21, February 13, 2018): Adds split screen Video, Lens Correction, and 3D Title Editor. Corel VideoStudio Pro 2019 (v22, February 12, 2019): Adds Color Grading, Morph Transitions, and MultiCam Capture Lite. Corel VideoStudio Pro 2020 (v23, February 25, 2020). Corel VideoStudio Pro 2021 (v24, March 26, 2021): Adds Instant Project Templates, AR Stickers, and performance improvements (particularly regarding hardware acceleration). Corel VideoStudio Pro 2022 (v25, March 6, 2022): Adds face effects, GIF Creator, transitions for Camera Movements, a speech to text converter, and ProRes Smart Proxy.

    Read more →
  • Butler in a Box

    Butler in a Box

    Butler in a Box was an early voice-controlled home automation device developed in 1983 by magician Gus Searcy and programmer Franz Kavan. The device allowed users to control various home electronics, such as lights and phones, using voice commands. It predated modern smart speakers and virtual assistants by several decades. == History == The idea for the Butler in a Box originated in 1983 when Searcy was asked by friends why he couldn't simply command lights to turn on and off if he could pull rabbits out of hats, given his background as a professional magician. Searcy partnered with former IBM programmer Kavan to develop the device, with their first prototype being named "Sidney". The Butler in a Box combined remote control technology with voice recognition to enable control of home devices. However, it faced challenges due to the technological limitations of the era and its high price point of nearly $1,500 (equivalent to around $3,700 in 2021). == Features and functionality == Users could activate the Butler in a Box by speaking a wake word, typically a traditional butler name, and the device would address the user as "boss". It was capable of performing tasks such as: Turning lights on and off, controlling individual zones if lights were connected to remote control modules Making and receiving phone calls Setting timers Pairing with sensors to function as a security alarm system However, the device required extensive voice training for each user, a time-consuming process compared to modern voice recognition. Additionally, settings and trained commands would be lost if power was out for over 3 hours due to the volatile memory technology used at the time. == Reception and legacy == While innovative for its time, the Butler in a Box did not achieve widespread commercial success due to its high price and the technical limitations of the 1980s. Nevertheless, it served as an important early step in the development of home automation and showcased the potential for voice-controlled technology to enhance accessibility and convenience in the home. Decades later, products like Amazon Alexa, Google Home, and Apple's Siri would make voice-controlled smart home devices commonplace and affordable, building on the groundwork laid by early attempts like the Butler in a Box.

    Read more →
  • Aseprite

    Aseprite

    Aseprite ( ace-prite) is a proprietary, source-available image editor designed primarily for pixel art drawing and animation. It runs on Windows, macOS, and Linux, and features different tools for image and animation editing such as layers, frames, tilemap support, command-line interface, Lua scripting, among others. It is developed by Igara Studio S.A. and led by the developers David, Gaspar, and Martín Capello. Aseprite can be downloaded as freeware, (albeit it does not have the ability to save sprites) or purchased on Steam or Itch.io. Aseprite source code and binaries are distributed under EULA, educational, and Steam proprietary licenses. == History == Aseprite, formerly known as Allegro Sprite Editor, had its first release in 2001 as a free software project under the GPLv2 license. This license was kept until August 2016 with version v1.1.8, when the developers switched to a EULA, thus making the software proprietary. On the 1st of September 2016, the main developer, David Capello, wrote a post on the Aseprite Devblog explaining this change. The EULA permits others to download the Aseprite source code, compile it, and use it for personal purposes, but forbids its redistribution to third parties. After the license change, LibreSprite, a free and open source version of it, was created. Both before and after the license change, Aseprite was sold online, on Steam, itch.io, and the project's website. The project's code repository was hosted on Google Code until August 2014, when it was migrated to GitHub, where it remains hosted to date. As of October 2022, its repository has had 68 contributors and around 19 thousand stars. From 2014 to 2021, Aseprite had 66 different releases. Aseprite was used in the development of several notable games such as TowerFall (2013), Celeste (2018), Minit (2018), Wargroove (2019), Loop Hero (2021), Eastward (2021), Unpacking (2021), Haiku the Robot (2022) and Pizza Tower (2023). == Design and features == The main design purpose of Aseprite is to create animated 2D pixel-art sprites. Some of its features include: Layers and frames, with layer grouping and animation tagging Pixel-art specific transformations and tools (pixel-perfect modes, custom brushes, etc.) Animation real-time preview and onion skinning Tilemap and tileset modes Color palette managing, including 65 default palettes Color profiles and modes (RGBA, indexed and grayscale) Non-square pixels Command line interface (CLI) and Lua scripting Aseprite uses its own binary file type to store data, which is typically saved with .ase or .aseprite extensions. Different third-party projects were developed to support parsing of .ase files in programming languages including C#, Python and JavaScript, and in game engines such as Unity and Godot. Images and animations can be exported to different file formats including PNG, GIF, FLC, FLI, JPEG, PCX, TGA, ICO, SVG, and bitmap (BMP).

    Read more →
  • Perceptual robotics

    Perceptual robotics

    Perceptual robotics is an interdisciplinary science linking Robotics and Neuroscience. It investigates biologically motivated robot control strategies, concentrating on perceptual rather than cognitive processes and thereby sides with J. J. Gibson's view against the Poverty of the stimulus theory. As a working definition, the following quote from Chapter 64 by H. Bülthoff, C. Wallraven and M. Giese from The Springer Handbook of Robotics, edited by Bruno Siciliano and Oussama Khatib, published by Springer in 2007, could be used: In the following we will apply the term Perceptual Robotics to signify the design of robots based on principles that are derived from human perception on all three levels in the sense of Marr. This includes a realization in terms of specific neural circuits as well as the transfer of more abstract biologically-inspired strategies for the solution of relevant computational problems.

    Read more →
  • Layers (digital image editing)

    Layers (digital image editing)

    Layers are used in digital image editing to separate different elements of an image. A layer can be compared to a transparency on which imaging effects or images are applied and placed over or under an image. Today they are an integral feature of image editors. In the early days of computing, memory was at a premium and the idea of using multi-layered images was considered infeasible in personal computer applications as the tradeoffs were image size and color depth. As the price of memory fell it became feasible to apply the concept of layering to raster images. The first software known to apply the concept of layers was LALF, which was released in 1989 for the NEC PC-9801. LALF's terminology for layers is "cells", after the concept of drawing animation frames over-top of a stencil. Layers were introduced in Western markets by Fauve Matisse (later Macromedia xRes), and then available in Adobe Photoshop 3.0, in 1994, which lead to widespread adoption. In vector image editors that support animation, layers are used to further enable manipulation along a common timeline for the animation; in SVG images, the equivalent to layers are "groups". == Layer types == There are different kinds of layers, and not all of them exist in all programs. They represent a part of a picture, either as pixels or as modification instructions. They are stacked on top of each other, and depending on the order, determine the appearance of the final picture. In graphics software, layers are the different levels at which one can place an object or image file. In the program, layers can be stacked, merged, or defined when creating a digital image. Layers can be partially obscured allowing portions of images within a layer to be hidden or shown in a translucent manner within another image. Layers can also be used to combine two or more images into a single digital image. For the purpose of editing, working with layers allows for applying changes to just one specific layer. == Layer (basic) == The standard layer available to most programs consists of a rectangular, semitransparent picture which may be superimposed over other layers. Some programs require that layers cover the same area as the final canvas, but others offer layers of multiple sizes. Each layer may bear individual settings, such as opacity, blending modes, dynamic filters, and potentially hundreds of other properties. == Layer mask == A layer mask is linked to a layer and hides part of the layer from the picture. What is painted black on the layer mask will not be visible in the final picture. What is grey will be more or less transparent depending on the shade of grey. As the layer mask can be both edited and moved around independently of both the background layer and the layer it applies to, it gives the user the ability to test a lot of different combinations of overlay. == Adjustment layer == An adjustment layer typically applies a common effect like brightness or saturation to other layers. However, as the effect is stored in a separate layer, it is easy to try it out and switch between different alternatives, without changing the original layer. In addition, an adjustment layer can easily be edited, just like a layer mask, so an effect can be applied to just part of the image.

    Read more →
  • Desktop video

    Desktop video

    Desktop video refers to a phenomenon lasting from the mid-1980s to the early 1990s when the graphics capabilities of personal computers such as the Amiga, Macintosh II, and specially-upgraded IBM PC compatibles had advanced to the point where individuals and local broadcasters could use them for analog non-linear editing and vision mixing in video production. Despite the use of computers, desktop video should not be confused with digital video since the video data remained analog, and it uses items like a VCR and a camcorder to record the video. Full-screen, full-motion video's vast storage requirements meant that the promise of digital encoding would not be realized on desktop computers for at least another decade. == Description == There were multiple models of genlock cards available to synchronize the content; the Newtek Video Toaster was commonly used in Amiga in countries that used NTSC (PAL-M in Brazil), while PCs had Truevision and Matrox Illuminator cards and Mac systems had the SuperMac Video Spigot and Radius VideoVision cards. Apple later introduced the Macintosh Quadra 840AV and Centris 660AV systems to specifically address this market. Desktop video was a parallel development to desktop publishing and enabled many small production houses and local TV stations to produce their own original content for the first time. Along with the advent of public-access cable channels, desktop video meant that television advertising became affordable for local businesses such as retailers, restaurants, real estate agents, contractors and auto dealers. As with the phrase desktop publishing, use of the term died out as the technologies to which it referred become the norm for any kind of video production.

    Read more →
  • Image translation

    Image translation

    Image translation is the machine translation of images of printed text (posters, banners, menus, screenshots etc.). This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language. == General == Machine translation made available on the internet (web and mobile) is a notable advance in multilingual communication eliminating the need for an intermediary translator/interpreter, translating foreign texts still poses a problem to the user as they cannot be expected to be able to type the foreign text they wish to translate and understand. Manually entering the foreign text may prove to be a difficulty especially in cases where an unfamiliar alphabet is used from a script which user can't read, e.g. Cyrillic, Chinese, Japanese etc. for an English speaker or any speaker of a Latin-based language or vice versa. The technical advancements in OCR made it possible to recognize text from images. The possibility to use one's mobile device's camera to capture and extract printed text is also known as mobile OCR and was first introduced in Japanese manufactured mobile telephones in 2004. Using the handheld's camera one could take a picture of (a line of) text and have it extracted (digitalized) for further manipulation such as storing the information in their contacts list, as a web page address (URL) or text to use in an SMS/email message etc. Presently, mobile devices having a camera resolution of 2 megapixels or above with an auto-focus ability, often feature the text scanner service. Taking the text scanning facility one step further, image translation emerged, giving users the ability to capture text with their mobile phone's camera, extract the text, and have it translated in their own language. More and more applications emerged on this technology including Word Lens. After getting acquired by Google, it was made a part of Google Translate mobile app. Another simultaneous advancement in Image Processing, has also made it possible now to replace the text on the image with the translated text and create a new image altogether. == History == The development of the image translation service springs from the advances in OCR technology (miniaturization and reduction of memory resources consumed) enabling text scanning on mobile telephones. Among the first to announce mobile software capable of “reading” text using the mobile device's camera is International Wireless Inc. who in February 2003 released their “CheckPoint” and “WebPoint” applications. “CheckPoint” reads critical symbolic information on checks and is aimed at reducing losses that mobile merchants suffer from “bounced” checks by scanning the MICR number on the bottom of a check, while “WebPoint” enables the visual recognition and decoding of printed URL's, which are then opened by the device's web browser. The first commercial release of a mobile text scanner, however, took place in December 2004 when Vodafone and Sharp began selling the 902SH mobile which was the first to feature a 2 megapixel digital camera with optical zoom. Among the device's various multimedia features was the built-in text/bar code/QR code scanner. The text scanner function could handle up to 60 alphabetical characters simultaneously. The scanned text could be then sent as an email or SMS message, added as a dictionary entry or, in the case of scanned URLs, opened via the device's web browser. All subsequent Sharp mobiles feature the text scanner functionality. In September 2005, NEC Corporation and the Nara Institute of Science and Technology in Japan (NAIST) announced new software capable of transforming cameraphones into text scanners. The application differs substantially from similarly equipped mobile telephones in Japan (able to scan businesscards and small bits of text and use OCR to convert that to editable text or to URL addresses) by it ability to scan a whole page. The two companies, however, said they would not release the software commercially before the end of 2008. Combining the text scanner function with machine translation technology was first made by US company RantNetwork who in July 2007 started selling the Communilator, a machine translation application for mobile devices featuring the Image Translation functionality. Using the built-in camera, the mobile user could take a picture of some printed text, apply OCR to recognize the text and then translate it into any one of over 25 language available. In April 2008 Nokia showcased their Shoot-to-Translate application for the N73 model which is capable of taking a picture using the device's camera, extracting the text and then translating it. The application only offers Chinese to English translation, and does not handle large segments of text. Nokia said they are in the process of developing their Multiscanner product which, besides scanning text and business cards, would be able to translate between 52 languages. Again in April 2008, Korean company Unichal Inc. released their handheld Dixau text scanner capable of scanning and recognizing English text and then translating it into Korean using online translation tools such as Wikipedia or Google Translate. The device is connected to a PC or a laptop via the USB port. In February 2009, Bulgarian company Interlecta presented at the Mobile World Congress in Barcelona their mobile translator including image recognition and speech synthesis. The application handles all European languages along with Chinese, Japanese and Korean. The software connects to a server over the Internet to accomplish the image recognition and the translation. In May 2014, Google acquired Word Lens to improve the quality of visual and voice translation. It is able to scan text or picture with one's device and have it translated instantly. Since the OCR has been improving many companies or website started combining OCR and translation, to read the text from an image and show the translated text. In August 2018, an Indian company created ImageTranslate. It is able to read, translate and re-create the image in another language. As of late 2018, the tool added 13 new languages, including Arabic, Thai, Vietnamese, Hindi, and Bengali, significantly increasing its utility in Asia and the Middle East. This helps users translate photos already stored in their phone's gallery, not just live, real-time views. Currently, image translation is offered by the following companies: Google Translate app with camera ImageTranslate Yandex

    Read more →
  • SciPy

    SciPy

    SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transform, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering. SciPy is also a family of conferences for users and developers of these tools: SciPy (in the United States), EuroSciPy (in Europe) and SciPy.in (in India). Enthought originated the SciPy conference in the United States and continues to sponsor many of the international conferences as well as host the SciPy website. The SciPy library is currently distributed under the BSD license, and its development is sponsored and supported by an open community of developers. It is also supported by NumFOCUS, a community foundation for supporting reproducible and accessible science. == Components == The SciPy package is at the core of Python's scientific computing capabilities. Available sub-packages include: cluster: hierarchical clustering, vector quantization, K-means constants: physical constants and conversion factors datasets: various example datasets for demonstrating image and data processing differentiate: numerical differentiation for first and second derivatives fft: Discrete Fourier Transform algorithms fftpack: Legacy interface for Discrete Fourier Transforms integrate: numerical integration routines interpolate: interpolation tools io: data input and output, including support for MATLAB and Matrix Market files linalg: linear algebra routines ndimage: various functions for multi-dimensional image processing odr: orthogonal distance regression classes and algorithms optimize: optimization algorithms including linear programming and a variety of numerical nonlinear programming optimizers signal: signal processing tools sparse: sparse matrices and related algorithms spatial: algorithms for spatial structures such as k-d trees, nearest neighbors, convex hulls, etc. special: special functions stats: statistical functions == Data structures == The basic data structure used by SciPy is a multidimensional array provided by the NumPy module. NumPy provides some functions for linear algebra, Fourier transforms, and random number generation, but not with the generality of the equivalent functions in SciPy. NumPy can also be used as an efficient multidimensional container of data with arbitrary datatypes. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Older versions of SciPy used Numeric as an array type, which is now deprecated in favor of the newer NumPy array code. == History == In the 1990s, Python was extended to include an array type for numerical computing called Numeric. (This package was eventually replaced by NumPy, which was written by Travis Oliphant in 2006 as a blending of Numeric and Numarray, with Numarray itself being started in 2001.) As of 2000, there was a growing number of extension modules and increasing interest in creating a complete environment for scientific and technical computing. In 2001, Travis Oliphant, Eric Jones, and Pearu Peterson merged code they had written and called the resulting package SciPy. The newly created package provided a standard collection of common numerical operations on top of the Numeric array data structure. Shortly thereafter, Fernando Pérez released IPython, an enhanced interactive shell widely used in the technical computing community, and John Hunter released the first version of Matplotlib, the 2D plotting library for technical computing. Since then the SciPy environment has continued to grow with more packages and tools for technical computing. == Scientific Python versus ScientificPython == In the scientific literature, SciPy is occasionally referred to as "Scientific Python (SciPy)". This is incorrect: the official name of the project is just "SciPy". Furthermore, expanding "SciPy" as "Scientific Python" may cause confusion with "ScientificPython", a project led by Konrad Hinsen of Orléans University that was active between 1995 and 2014. "Scientific Python" is also used for the related ecosystem of tools.

    Read more →
  • VueScan

    VueScan

    VueScan is a computer program for image scanning, especially of photographs, including negatives. It supports optical character recognition (OCR) of text documents. The software can be downloaded and used free of charge, but adds a watermark on scans until a license is purchased. == Purpose == VueScan is intended to work with a large number of image scanners, excluding specialised professional scanners such as drum scanners, on many computer operating systems (OS), even if drivers for the scanner are not available for the OS. These scanners are supplied with device drivers and software to operate them, included in their price. A 2014 review considered that the reasons to purchase VueScan are to allow older scanners not supported by drivers for newer operating systems to be used in more up-to-date systems and for better scanning and processing of photographs (prints; also slides and negatives when supported by scanners) than is afforded by manufacturers' software. The review did not report any advantages to VueScan's processing of documents over other software. The reviewer considered VueScan comparable to SilverFast, a similar program, with support for some specific scanners better in one or the other. Vuescan supports more scanners, with a single purchase giving access to the full range of both film and flatbed scanners, and costs less. The VueScan program can be used with its own drivers or with drivers supplied by the scanner manufacturer, if supported by the operating system. VueScan drivers can also be used without the VueScan program by application software that supports scanning directly, such as Adobe Photoshop, again enabling the use of scanners without current manufacturers' drivers. In 2019 when Apple released macOS Catalina, they removed support for running 32-bit programs, including 32-bit drivers for scanning equipment. In response, Hamrick released VueScan 9.7, effectively saving thousands of scanners from being rendered obsolete. == Overview == VueScan enables the user to modify and fine-tune the scanning parameters. The program uses its own independent method to interface with scanner hardware, and can support many older scanners under computer operating systems for which drivers are not available, allowing old scanners to be used with newer platforms that do not otherwise support them. VueScan supports an increasing number of scanners and digital cameras; 2,400 on Windows, 2,100 on Mac OS X and 1,900 on Linux in 2018. VueScan is supplied as one downloadable file for each operating system, which supports the full range of scanners. Without the purchase of a license, the program runs in fully functional demonstration mode, identical to Professional mode, except that watermarks are superimposed on saved and printed images. Purchase of a license removes the watermark. A standard license allows updates for one year; a professional license allows unlimited updates and provides some additional features. VueScan supports optical character recognition (OCR), with English included, and 32 additional language packages available on its website. In September 2011, VueScan co-developer Ed Hamrick said that he was selling US$3 million per year of VueScan licenses.

    Read more →
  • Freemake Video Converter

    Freemake Video Converter

    Freemake Video Converter is a freemium video editing app developed by Ellora Assets Corporation. Designed primarily for entry-level users, the software offers a range of functionalities including video format conversion, DVD ripping, and the creation of photo slideshows and music visualizations. Additionally, Freemake Video Converter is capable of burning video streams that are compatible with various media, such as DVDs and Blu-ray Discs. It also features direct video uploading capabilities to platforms like YouTube., enhancing its utility for content creators. The application's user-friendly interface and broad compatibility make it accessible for individuals with minimal video editing experience. == Features == Freemake Video Converter can perform simple non-linear video editing tasks, such as cutting, rotating, flipping, and combining multiple videos into one file with transition effects. It can also create photo slideshows with background music. Users are then able to upload these videos to YouTube. Freemake Video Converter can read the majority of video, audio, and image formats, and outputs them to AVI, MP4, WMV, Matroska, FLV, SWF, 3GP, DVD, Blu-ray, MPEG and MP3. The program also prepares videos supported by various multimedia devices, including Apple devices (iPod, iPhone, iPad), Xbox, Sony PlayStation, Samsung, Nokia, BlackBerry, and Android mobile devices. The software is able to perform DVD burning and is able to convert videos, photographs, and music into DVD video. The user interface is based on Windows Presentation Foundation technology. Freemake Video Converter supports NVIDIA CUDA technology for H.264 video encoding (starting with version 1.2.0). == Important updates == Freemake Video Converter 2.0 was a major update that integrated two new functions: ripping video from online portals and Blu-ray disc creation and burning. Version 2.1 implemented suggestions from users, including support for subtitles, ISO image creation, and DVD to DVD/Blu-ray conversion. With version 2.3 (earlier 2.2 Beta), support for DXVA has been added to accelerate conversion (up to 50% for HD content). Version 3.0 added HTML5 video creation support and new presets for smartphones. Version 4.0 (introduced in April 2013) added a freemium "Gold Pack" of extra features that can be added if a "donation" is paid. Starting with version 4.0.4, released on 27 August 2013, the program adds a promotional watermark at the end of every video longer than 5 minutes unless Gold Pack is activated. Version 4.1.9, released on 25 November 2015 added support for drag-and-drop functions that were not available in prior versions. Since at least version 4.1.9.44 (1 May 2017), the Freemake Welcome Screen is added at the beginning of the video, and the big Freemake logo is watermarked in the center of the whole video. This decreases the quality of free outputs, and users are forced to pay money to remove the watermark or stop using it. Version 4.1.9.31 (11 August 2016) does not have this restriction. == Licensing issues == FFmpeg has added Freemake Video Converter v1.3 to its Hall of Shame. An issue tracker entry for this product, opened on 16 December 2010, says it is in violation of the GNU General Public License as it is distributing components of the FFmpeg project without including due credit. Ellora Assets Corporation has not responded yet. == Bundled software from sponsors == Since version 4.0, Freemake Video Converter's installer includes a potentially unwanted search toolbar from Conduit as well as SweetPacks malware. Although users can decline the software during installation, the opt-out option is rendered in gray, which could mistakenly give the impression that it's disabled.

    Read more →