AI Face Look

AI Face Look — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Model compression

    Model compression

    Model compression is a machine learning technique for reducing the size of trained models. Large models can achieve high accuracy, but often at the cost of significant resource requirements. Compression techniques aim to compress models without significant performance reduction. Smaller models require less storage space, and consume less memory and compute during inference. Compressed models enable deployment on resource-constrained devices such as smartphones, embedded systems, edge computing devices, and consumer electronics computers. Efficient inference is also valuable for large corporations that serve large model inference over an API, allowing them to reduce computational costs and improve response times for users. Model compression is not to be confused with knowledge distillation, in which a smaller "student" model is trained to imitate the input-output behavior of a larger "teacher" model (as opposed to using the "teacher"'s trained parameters or the "teacher"'s training targets). == Techniques == Several techniques are employed for model compression. === Pruning === Pruning sparsifies a large model by setting some parameters to exactly zero. This effectively reduces the number of parameters. This allows the use of sparse matrix operations, which are faster than dense matrix operations. Pruning criteria can be based on magnitudes of parameters, the statistical pattern of neural activations, Hessian values, etc. === Quantization === Quantization reduces the numerical precision of weights and activations. For example, instead of storing weights as 32-bit floating-point numbers, they can be represented using 8-bit integers. Low-precision parameters take up less space, and takes less compute to perform arithmetic with. It is also possible to quantize some parameters more aggressively than others, so for example, a less important parameter can have 8-bit precision while another, more important parameter, can have 16-bit precision. Inference with such models requires mixed-precision arithmetic. Quantized models can also be used during training (rather than after training). PyTorch implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling. === Low-rank factorization === Weight matrices can be approximated by low-rank matrices. Let W {\displaystyle W} be a weight matrix of shape m × n {\displaystyle m\times n} . A low-rank approximation is W ≈ U V T {\displaystyle W\approx UV^{T}} , where U {\displaystyle U} and V {\displaystyle V} are matrices of shapes m × k , n × k {\displaystyle m\times k,n\times k} . When k {\displaystyle k} is small, this both reduces the number of parameters needed to represent W {\displaystyle W} approximately, and accelerates matrix multiplication by W {\displaystyle W} . Low-rank approximations can be found by singular value decomposition (SVD). The choice of rank for each weight matrix is a hyperparameter, and jointly optimized as a mixed discrete-continuous optimization problem. The rank of weight matrices may also be pruned after training, taking into account the effect of activation functions like ReLU on the implicit rank of the weight matrices. == Training == Model compression may be decoupled from training, that is, a model is first trained without regard for how it might be compressed, then it is compressed. However, it may also be combined with training. The "train big, then compress" method trains a large model for a small number of training steps (less than it would be if it were trained to convergence), then heavily compress the model. It is found that at the same compute budget, this method results in a better model than lightly compressed, small models. In Deep Compression, the compression has three steps. First loop (pruning): prune all weights lower than a threshold, then finetune the network, then prune again, etc. Second loop (quantization): cluster weights, then enforce weight sharing among all weights in each cluster, then finetune the network, then cluster again, etc. Third step: Use Huffman coding to losslessly compress the model. The SqueezeNet paper reported that Deep Compression achieved a compression ratio of 35 on AlexNet, and a ratio of ~10 on SqueezeNets.

    Read more →
  • FMLLR

    FMLLR

    In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

    Read more →
  • Comparison of video editing software

    Comparison of video editing software

    This is a comparison of non-linear video editing software applications. See also a more complete list of video editing software. == General information == This table gives basic general information about the different editors: === Active === === Discontinued / Inactive === ==== Definition ==== professional: used for full length Hollywood movies; professional (small): mainly used for paid commercials, short films or podcasts/YouTube channels; prosumer: Mainly targeting private use, anything that can do more than just trimming a film; basic: trimming a film; == System requirements == This table lists the operating systems that different editors can run on without emulation, as well as other system requirements. Note that minimum system requirements are listed; some features (like High Definition support) may be unavailable with these specifications. "Unix" includes the similar Linux, BSD and Unix-like operating systems. == High definition/High resolution import == The table below indicates the ability of each program to import various High Definition video or High resolution video formats for editing. == Feature set == == Output options == Please note that recording to Blu-ray does not imply 1080@50p/60p . Most only support up to 1080i 25/30 frames per second recording. Also not all formats can be output.

    Read more →
  • Luma (video)

    Luma (video)

    In video, luma ( Y ′ {\displaystyle Y'} ) represents the brightness in an image (the "black-and-white" or achromatic portion of the image). Luma is typically paired with chroma. Luma represents the achromatic image, while the chroma components represent the color information. Converting R′G′B′ sources (such as the output of a three-CCD camera) into luma and chroma allows for chroma subsampling: because human vision has finer spatial sensitivity to luminance ("black and white") differences than chromatic differences, video systems can store and transmit chromatic information at lower resolution, optimizing perceived detail at a particular bandwidth. == Luma versus relative luminance == Luma is the weighted sum of gamma-compressed R′G′B′ components of a color video—the prime symbols ′ denote gamma compression. The word was proposed to prevent confusion between luma as implemented in video engineering and relative luminance as used in color science (i.e. as defined by CIE). Relative luminance is formed as a weighted sum of linear RGB components, not gamma-compressed ones. Even so, luma is sometimes erroneously called luminance. SMPTE EG 28 recommends the symbol Y ′ {\displaystyle Y'} to denote luma and the symbol Y {\displaystyle Y} to denote relative luminance. === Use of relative luminance === While luma is more often encountered, relative luminance is sometimes used in video engineering when referring to the brightness of a monitor. The formula used to calculate relative luminance uses coefficients based on the CIE color matching functions and the relevant standard chromaticities of red, green, and blue (e.g., the original NTSC primaries, SMPTE C, or Rec. 709). For the Rec. 709 (and sRGB) primaries, the linear combination, based on pure colorimetric considerations and the definition of relative luminance is: Y = 0.2126 R + 0.7152 G + 0.0722 B {\displaystyle Y=0.2126R+0.7152G+0.0722B} The formula used to calculate luma in the Rec. 709 spec arbitrarily also uses these same coefficients, but with gamma-compressed components: Y ′ = 0.2126 R ′ + 0.7152 G ′ + 0.0722 B ′ , {\displaystyle Y'=0.2126R'+0.7152G'+0.0722B',} where the prime symbol ′ denotes gamma compression. == Rec. 601 luma versus Rec. 709 luma coefficients == For digital formats following CCIR 601 (i.e. most digital standard definition formats), luma is calculated with this formula: Y 601 ′ = 0.299 R ′ + 0.587 G ′ + 0.114 B ′ {\displaystyle Y'_{\text{601}}=0.299R'+0.587G'+0.114B'} Formats following ITU-R Recommendation BT. 709 (i.e. most digital high definition formats) use a different formula: Y 709 ′ = 0.2126 R ′ + 0.7152 G ′ + 0.0722 B ′ {\displaystyle Y'_{\text{709}}=0.2126R'+0.7152G'+0.0722B'} Modern HDTV systems use the 709 coefficients, while transitional 1035i HDTV (MUSE) formats may use the SMPTE 240M coefficients: Y 240 ′ = 0.212 R ′ + 0.701 G ′ + 0.087 B ′ = Y 145 ′ {\displaystyle Y'_{\text{240}}=0.212R'+0.701G'+0.087B'=Y'_{\text{145}}} These coefficients correspond to the SMPTE RP 145 primaries (also known as "SMPTE C") in use at the time the standard was created. The change in the luma coefficients is to provide the "theoretically correct" coefficients that reflect the corresponding standard chromaticities ('colors') of the primaries red, green, and blue. However, there is some controversy regarding this decision. The difference in luma coefficients requires that component signals must be converted between Rec. 601 and Rec. 709 to provide accurate colors. In consumer equipment, the matrix required to perform this conversion may be omitted (to reduce cost), resulting in inaccurate color. == Luma and luminance errors == As well, the Rec. 709 luma coefficients may not necessarily provide better performance. Because of the difference between luma and relative luminance, luma does not exactly represent the luminance in an image. As a result, errors in chroma can affect luminance. Luma alone does not perfectly represent luminance; accurate luminance requires both accurate luma and chroma. Hence, errors in chroma "bleed" into the luminance of an image. Note the bleeding in lightness near the borders. Due to the widespread usage of chroma subsampling, errors in chroma typically occur when it is lowered in resolution/bandwidth. This lowered bandwidth, coupled with high frequency chroma components, can cause visible errors in luminance. An example of a high frequency chroma component would be the line between the green and magenta bars of the SMPTE color bars test pattern. Error in luminance can be seen as a dark band that occurs in this area.

    Read more →
  • Public computer

    Public computer

    A public computer (or public access computer) is any of various computers available in public areas. Some places where public computers may be available are libraries, schools, or dedicated facilities run by government. Public computers share similar hardware and software components to personal computers, however, the role and function of a public access computer is entirely different. A public access computer is used by many different untrusted individuals throughout the course of the day. The computer must be locked down and secure against both intentional and unintentional abuse. Users typically do not have authority to install software or change settings. A personal computer, in contrast, is typically used by a single responsible user, who can customize the machine's behavior to their preferences. Public access computers are often provided with tools such as a PC reservation system to regulate access. The world's first public access computer center was the Marin Computer Center in California, co-founded by David and Annie Fox in 1977. == Kiosks == A kiosk is a special type of public computer using software and hardware modifications to provide services only about the place the kiosk is in. For example, a movie ticket kiosk can be found at a movie theater. These kiosks are usually in a secure browser with zero access to the desktop. Many of these kiosks may run Linux, however, ATMs, a kiosk designed for depositing money, often run Windows XP. == Public computers in the United States == === Library computers === In the United States and Canada, almost all public libraries have computers available for the use of patrons, though some libraries will impose a time limit on users to ensure others will get a turn and keep the library less busy. Users are often allowed to print documents that they have created using these computers, though sometimes for a small fee. ==== Privacy ==== Privacy is an important part of the public library institution, since the libraries entitle the public to intellectual freedom. Use of any computer or network may create records of users' activities that can jeopardize their privacy. It is possible for a patron to jeopardize their privacy if they do not delete cache, clear cookies, or documents from the public computer. In order for a member of the public to remain private on a computer, the American Library Association (ALA) has guidelines. These give patrons an idea of the right way to keep using public library computers. In their provision of services to library users, librarians have an ethical responsibility, expressed in the ALA Code of Ethics, to preserve users' right to privacy. A librarian is also responsible for giving users an understanding of private patron use and access. Libraries must ensure that users have the following rights when browsing on public computers: the computer automatically will clear a users history; libraries should display privacy screens so users do not see another patron's screen; updating software for effective safety measures; restoration data software to clear documents that users may have left on their computers and to combat possible malware; security practices; and making users aware of any possible monitoring of their browsing activities. Users can also view the Library Privacy Checklist for Public Access Computers and Networks to better understand what libraries strive for when protecting privacy. === School computers === The U.S. government has given money to many school boards to purchase computers for educational applications. Schools may have multiple computer labs, which contain these computers for students to use. There is usually Internet access on these machines, but some schools will put up a blocking service to limit the websites that students are able to access to only include educational resources, such as Google. In addition to controlling the content students are viewing, putting up these blocks can also help to keep the computers safe by preventing students from downloading malware and other threats. However, the effectiveness of such content filtering systems is questionable since it can easily be circumvented by using proxy websites, Virtual Private Networks, and for some weak security systems, merely knowing the IP address of the intended website is enough to bypass the filter. School computers often have advanced operating system security to prevent tech-savvy students from inflicting damage (i.e. the Windows Registry Editor and Task Manager, etc.) are disabled on Microsoft Windows machines. Schools with very advanced tech services may also install a locked down BIOS/firmware or make kernel-level changes to the operating system, precluding the possibility of unauthorized activity.

    Read more →
  • Cheekd

    Cheekd

    Cheekd is a dating app based in New York City. It was founded in 2010 by Lori Cheek. == History == The service debuted with the name "Cheek'd". Founder Lori Cheek appeared on the television program, Shark Tank in February 2014, but did not succeed in obtaining funding from any of the five judges. She said Cheek’d only had 1000 subscribers at that time. === Business card model === Cheek'd offered two plans, paid and free. For $25, subscribers got a set of 50 business cards that could be given out once someone caught their eye. Each card had a phrase, an online code, and a URL to the subscriber's account. Recipients could look up the giver's profile. In addition to purchasing cards, there was a $9.95 monthly membership fee. === Smartphone app === In 2015, the service's name changed from "Cheek'd" to "Cheekd". The new app used Bluetooth technology to alert users whenever a compatible user was within a 30-foot radius, instead of using cards. == Patent lawsuit == The original business card-based model for Cheekd had been claimed as a patented process by Lori Cheek, as U.S. patent 8,543,465. In September 2017, a complaint was filed, alleging that the idea was not original to Lori Cheek. Cheek responded, stating that the complaint was baseless, and a complete fabrication. The lawsuit Pirri v. Cheek was dismissed in a pre-trial conference in New York's Federal Court on April 5, 2018.

    Read more →
  • CinePlayer

    CinePlayer

    CinePlayer is a software based media player used to review Digital Cinema Packages (DCP) without the need for a digital cinema server by Doremi Labs. CinePlayer can play back any DCP, not just those created by Doremi Mastering products. In addition to playing DCPs, CinePlayer can also playback JPEG2000 image sequences and many popular multimedia file types. There are two versions of CinePlayer available, standard and Pro. The standard version supports playback of non-encrypted, 2D DCP's up to 2K resolution. The Pro version supports playback of encrypted, 2D or 3D DCP's with subtitles up to 4K resolution. == Supported formats == === Containers === AVI MOV MXF MPG TS WMV M2TS MTS MP4 MKV === Video codecs === JPEG2000 ProRes 422 DNxHD YUV Uncompressed 8-10 bits DIVX XVID MPEG4 AVC / H-264 VC-1 MPEG2 === Supported image sequences === BMP TIFF TGA DPX JPG J2C === Supported audio files === WAV MP3 WMA MP2

    Read more →
  • D/Vision Pro

    D/Vision Pro

    D/Vision Pro was one of the earliest marketed non-linear editing systems. It was released by TouchVision Systems, Inc. in the mid-1990s. The program was DOS-based and worked on either Intel's 386 or 486 processor. The system used AVI compression and worked with the Action Media II board. The system allowed users to digitize video, audio, and timecode, create an edit decision list (EDL), instantly play back the edited program, and output the finished EDL in a wide variety of formats. These cost-effective editing systems were used by numerous independent filmmakers and in low-budget productions during the mid-late 1990s. D/Vision Pro's low-quality compression led TouchVision (later renamed D/Vision Systems) to abandon it in favor of D/Vision Online, which was purchased by Discreet Logic and renamed edit. In June 2002, Discreet discontinued edit, as they did not want it to interfere with smoke sales which were more profitable. Discreet was later purchased by Autodesk.

    Read more →
  • Phrase structure grammar

    Phrase structure grammar

    The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue (Post canonical systems). Some authors, however, reserve the term for more restricted grammars in the Chomsky hierarchy: context-sensitive grammars or context-free grammars. In a broader sense, phrase structure grammars are also known as constituency grammars. The defining character of phrase structure grammars is thus their adherence to the constituency relation, as opposed to the dependency relation of dependency grammars. == History == In 1956, Chomsky wrote, "A phrase-structure grammar is defined by a finite vocabulary (alphabet) Vp, and a finite set Σ of initial strings in Vp, and a finite set F of rules of the form: X → Y, where X and Y are strings in Vp." == Constituency relation == In linguistics, phrase structure grammars are all those grammars that are based on the constituency relation, as opposed to the dependency relation associated with dependency grammars; hence, phrase structure grammars are also known as constituency grammars. Any of several related theories for the parsing of natural language qualify as constituency grammars, and most of them have been developed from Chomsky's work, including Government and binding theory Generalized phrase structure grammar Head-driven phrase structure grammar Lexical functional grammar The minimalist program Nanosyntax Further grammar frameworks and formalisms also qualify as constituency-based, although they may not think of themselves as having spawned from Chomsky's work, e.g. Arc pair grammar, and Categorial grammar.

    Read more →
  • RagTime

    RagTime

    RagTime is a frame-oriented business publishing software which combines word processing, spreadsheets, simple drawings, image processing, and charts, in a single document/program, integrated software. It is often used to create forms, reports, documentation, desktop publishing, and in office environments. Typical users are business clients, educational institutions, administrations, architects, and also private users. Ragtime includes the following modules: Page layout (forms, templates etc.) Word processing Image processing Spreadsheets, similar to Microsoft Excel Formulas and functions which can be used throughout, in text, graphics, and spreadsheets Charts in different types of diagrams Drawings in vector graphics including lines, polygons, Bézier curves and more Slide show (presentation of RagTime documents) Audio/video Buttons (pop-up menus, switches, and more) that can be used within RagTime documents Import/export of various file formats Support of the AppleScript scripting language available system-wide under macOS == Principle == RagTime differs from most other comparable programs or software packages in its strict frame-oriented design: all content is contained within frames on each page. The content can have a fixed position within its frame or, if it is text or a spreadsheet, flow into another frame that is connected to the first frame via a so-called “pipeline”. RagTime has no different document types for different types of data; all content is stored in a single compound document type. Thus, a RagTime document not only can contain multiple pages, but also multiple layouts within the same document; e.g. spreadsheets in addition to text and images. The RagTime filename extension is .rtd (RagTime document); for templates the extension is .rtt (RagTime template). The current version is RagTime 6.6.5. It is available for OS X (10.6-10.14) and Windows (XP/Vista/7/8/10). == Extensions == FileTime – allows accessing “FileMaker Pro” databases from RagTime documents under OS X RagTime Connect – ODBC database connection for RagTime 6 (Mac and Windows) Johannes – print extension for the simple creation of stapled or folded brochures, booklets etc. PowerFunctions – additional functions for a more effective creation of intelligent documents for exchanging data and for use in mixed Mac/Windows environments MetaFormula – SYLK-based extension that allows calculating text as formula == History == RagTime has been developed since 1985 for the Macintosh – originally named MacFrame – and was published in 1986. When released, it already had the present name, which was chosen following the then-available software package Lotus Jazz. In the European Macintosh market, RagTime quickly gained a prominent position that continues to this day, even though the market share has decreased. Despite repeated attempts, the program could not gain acceptance in the North American market due to its high cost ($395 in 1990). The North American sales office closed in 1991, shortly after Claris Corporation released ClarisWorks which duplicated much of the functionality of RagTime for a lower price. After the manufacturer – first Brüning & Everth, followed by B&E Software and today RagTime.de Development – had focused on the Macintosh only for a very long time, it also released a Windows version, RagTime 5.0, in 1999. However, the program could not assume great significance against established competitors, especially Microsoft Office. Until mid-2006 RagTime was, in addition to the commercial version, also available as a free version (RagTime Solo) for personal use. RagTime Solo included the same features and performance (except for spelling and Syllabification) dictionaries), but was not allowed for use in commercial environments. In other languages RagTime Solo was distributed as RagTime Privat. In a press release from July 5, 2006, RagTime announced the discontinuation of RagTime Solo: “… the RagTime Solo license conditions were often misinterpreted or deliberately flouted. Therefore we discontinued RagTime Solo, there will be no private version of RagTime 6 anymore.” After a successful start of the RagTime 6.0 software, sales edged significantly lower in the following years. Disagreements arose among the shareholders about the continuation of the company, which filed for bankruptcy in July 2007. As a result, the rights to RagTime were taken over by the newly established company RagTime.de Development GmbH, which was responsible for the development. The sales partner RagTime.de Sales GmbH distributed the RagTime products until October 2015. Today RagTime.de Development GmbH is also responsible for sales. The last level of development is the extensively revamped version RagTime 6.6 of 8 October 2015, which also includes new OS X features (e.g. high-resolution “Retina” displays) and supports Windows 10. == Programming == RagTime 1-3 were developed in Pascal, since version 4 the development is completely coded in C++. External programming and automation can be implemented via AppleScript on a Mac, and via OLE/COM-API (e.g. Visual Basic) under Windows. On a Mac, RagTime provides a comprehensive AppleScript library, for the automation of almost any task, from automatic document creation to the export of PDF documents. RagTime also supports “recordings” by use of the “AppleScript Editor”, which allows recording the interactive RagTime operation as an AppleScript program sequence. AppleScripts can be saved in the RagTime document and called via menu or shortcut keys. On Windows, RagTime (since version 6) disposes over an OLE/COM API, which allows automating many RagTime components via external programming. For that purpose there is a type library that installs the available RagTime OLE/COM object catalogue. Programming can be realized in all programming languages supported by Microsoft.

    Read more →
  • Direct voice input

    Direct voice input

    Direct voice input (DVI), sometimes called voice input control (VIC), is a style of human–machine interaction "HMI" in which the user makes voice commands to issue instructions to the machine through speech recognition. In the field of military aviation, DVI has been introduced into the cockpits of several modern military aircraft, such as the Eurofighter Typhoon, the Lockheed Martin F-35 Lightning II, the Dassault Rafale, the KF-21 Boramae and the Saab JAS 39 Gripen. Such systems have also been used for various other purposes, including industry control systems and speech recognition assistance for impaired individuals. == Overview == DVI systems can be divided into two major categories of functionality: "user-dependent" or "user-independent". A user-dependent system requires that a personal voice template to be generated for a specific person; the template for this individual has to be loaded onto their assigned machine prior to use of the DVI system for it to function properly. In contrast, a user-independent system does not require any personal voice template, being intended to respond correctly to the voice of any user. They can also be categorised between "discrete recognition" and "continuous recognition". Users of a discrete recognition system must pause between each word so that the DVI system can identify the separations between each word, while a continuous speech recognition system is capable of understanding a normal rate of speech. During the mid-2000s, researchers at the National Aerospace Laboratory in the Netherlands examined the use of DVI in the "GRACE" simulator; a total of twelve pilots participated in the ensuing experiment. The tests performed reportedly revealed that, while the hardware itself functioned well, several improvements were desirable prior to real-world deployment on aircraft since DVI operations actually consumed more time in comparison to traditional existing methods. Recommendations for improvements included the adoption of simpler syntax, the achievement of a greater recognition rate, and a decrease in response times; all of the issues encountered were determined to be of a technological nature, and were deemed feasible to resolve. The researchers concluded that in cockpits, especially during emergencies where pilots have to operate entirely on their own, a DVI system could be highly relevant, but that it was not of crucial importance during most other conceivable scenarios. Around the same time, evaluations of DVI systems for civil aviation purposes were conducted within the framework of Project SafeSound, coordinated by the European Union. It involved the observation of pilot workloads in real-world cockpits and contrasting them against pilot activity in flight simulators using both conventional systems and DVI assistance. The project aimed to enhance aviation safety and to decrease the workload in both ground and flight operations via the application of enhanced audio functions. == Applications == === Aviation === Prior to its widespread deployment, a handful of conventional military aircraft were converted to trial DVI systems; examples include the Harrier AV-8B and F-16 VISTA. In another case, a General Dynamics F-16 Fighting Falcon simulator was modified with DVI for a voice control study that was undertaken by the Royal Netherlands Air Force. DVI trials have also been conducted on helicopters, including the Boeing AH-64 Apache, showing the potential to improve flight safety and mission effectiveness. Numerous modern fighter aircraft have been outfitted with DVI systems, often in combination with various other man-machine interface schemes, such as HOTAS-compliant controls and other advanced control technologies. The combination of Voice and HOTAS control schemes has sometimes been referred to as the "V-TAS" concept. A prominent fighter aircraft to be furnished with a V-TAS cockpit is the Eurofighter Typhoon. The Lockheed Martin F-35 Lightning II also features a DVI system, which was developed by Adacel. Other examples includes the Dassault Rafale and the Saab JAS 39 Gripen. Numerous aircraft have been planned to use DVI. At one stage, the United States Air Force had sought to integrate DVI upon the Lockheed Martin F-22 Raptor; however, the technology was eventually judged to pose too many technical risks at that point in time, and thus such efforts were abandoned. === Personal === By 1990, working prototypes of speech recognition systems were being demonstrated; these were being promoted for the purpose of providing an effective man-machine interface for individuals with impaired speech. Techniques employed included time-encoded digital speech and automatic token set selection. Investigations of these early DVI systems reportedly included the use of automatic diagnostic routines and limited-scale trials using volunteers. During the 2010s, various companies were offering voice recognition systems to the general public in the form of personal digital assistants. One example is the Google Voice service, which allows users to pose questions via a DVI package installed on either a personal computer, tablet, or mobile phone. Numerous digital assistants have been developed, such as Amazon Echo, Siri, and Cortana, that use DVI to interact with users. === Commercial === DVI technology has enabled automated telephone systems to be widely deployed. Many companies commonly use centralised phone systems that route callers to the correct department via such methods. Various car manufacturers have also furnished their road vehicles with DVI systems; these typically allow drivers to control infotainment systems and interact with mobile phones with more convenience than legacy methods. During the late 1980s, investigations into the use of DVI systems for controlling CNC machines and other manufacturing apparatus were underway. During the 2010s, such systems were being used for logistics and warehouse management purposes.

    Read more →
  • Ave!Comics

    Ave!Comics

    Ave!Comics Production is a privately owned French company editing comics on smartphones, tablets and computers. It was founded in 2008 and it is a subsidiary of Aquafadas, a software development company in digital publishing owned by Kobo Inc. AveComics is a comic book store for digital comic books that can be used on computers, tablets, and smartphones.(iOS, Android) Readers can buy and read comic books, manga and graphic novels in French, English and Spanish. AveComics uses a technology created by Aquafadas for comics transformation, distribution and reading, based around its AVE format. The AveComics application was also a finalist in the BlackBerry Innovation Awards 2009, in the "Entertainment" category. == Company history == Aquafadas, a company working on creative software for Flash, HTML5, photo, and video editing, created the application MyComics to allow the reading of comics on mobile in 2006. This application was made available in 2008, to enable the reading and storing of comics on iPhone and iPod Touch. A reading system adapted to low resolution screens was also available. In October of the same year, the company launched a comics library on both devices, in partnership with the Angoulême International Comics Festival, Fnac and SNCF. This library included the official selection of the festival, and was downloaded over 150 000 times. In December 2008 "The Adventures of Lucky Luke n°3", at Lucky Comics was published on both devices. The comic made a 50 000 € turnover. In April 2009, "Les Blondes" 10th volume was the top-selling comic for 10 months on the AppStore. After, in August 2009, the AveComics application was launched on iPhone, iPod Touch and BlackBerry. The company's website was launched in September when more than 100 titles were available on smartphones and computers. == Catalogue == AveComics works with over 80 international publishers including Glénat, Marsu Productions, Delcourt, Casterman, Soleil, Ubisoft, Les Humanoïdes Associés and Mad Fabrik. Comics such as "Assassin's Creed", "Talisman", "Titeuf", and "Seoul District" are sold by the company. == Award == Grand Prix Software Venture Capital - Senate 2008.

    Read more →
  • Thermal attack

    Thermal attack

    A thermal attack (aka thermal imaging attack) is an approach that exploits heat traces to uncover the entered credentials. These attacks rely on the phenomenon of heat transfer from one object to another. During authentication, heat transfers from the users' hands to the surface they are interacting with, leaving heat traces behind that can be analyzed using thermal cameras that operate in the far-infrared spectrum. These traces can be recovered and used to reconstruct the passwords. In some cases, the attack can be successful even 30 seconds after the user has authenticated. Thermal attacks can be performed after the victim had authenticated, alleviating the need for in-situ observation attacks (e.g., shoulder surfing attacks) that can be affected by hand occlusions. While smudge attacks can reveal the order of entries of graphical passwords, such as the Android Lock Patterns, thermal attacks can reveal the order of entries even in the case of PINs or alphanumeric passwords. The reason thermal attacks leak information about the order of entry is because keys and buttons that the user touches first lose heat over time, while recently touched ones maintain the heat signature for a longer time. This results in distinguishable heat patterns that can tell the attacker which entry was entered first. Thermal attacks were shown to be effective against plastic keypads, such as the ones used to enter credit card's PINs in supermarkets and restaurants, and on handheld mobile devices such as smartphones and tablets. In their paper published at the Conference on Human Factors in Computing Systems (CHI 2017), Abdelrahman et al. showed that the attack is feasible on today's smartphones. They also proposed some ways to mitigate the attack, such as swiping randomly on the screen to distort the heat traces, or forcing maximum CPU usage for a few seconds. Thermal attacks can also infer passwords from heat traces on keyboards. Researchers at the University of Glasgow showed that attackers who use AI methods can be more effective in performing thermal attacks. Their study presents a new tool called ThermoSecure and evaluates it in two user studies. The results show that ThermoSecure can successfully attack passwords with an average accuracy of 92% to 55%, depending on the length of the password. The effectiveness of thermal attacks also depends on typing behavior and the material of the keycaps. ABS keycaps, which retain heat traces longer, are more vulnerable to thermal attacks. The study also discusses ways to protect against thermal attacks and presents seven potential mitigation approaches. Dr Khamis, who led the development of the technology with Norah Alotaibi and John Williamson, said with thermal imaging cameras more affordable than ever and machine learning becoming more accessible, it was "very likely that people around the world are developing systems along similar lines to ThermoSecure in order to steal passwords". == Thermal Attack Mitigation == === Simple and Practical Measures === One basic and effective way to mitigate thermal attacks is to deliberately create heat noise over the input interface, such as a keypad or keyboard, after entering a password. For instance, placing one's palm over the entire interface for a few seconds after use can obscure the thermal pattern left by the fingers, making it much more difficult for an unauthorized user to interpret the heat traces. === Range of Proposed Strategies === In addition to simple methods, researchers have developed a spectrum of mitigation strategies to counter thermal attacks. These strategies encompass 15 different approaches including: Use of Biometrics: Replacing traditional pin codes or passwords with biometric authentication, such as fingerprint recognition or facial recognition, eliminates the issue of residual heat on keypads. Heating the Interface: Implementing technology to slightly warm up the keypad can effectively neutralize the heat traces left by fingers, preventing thermal cameras from capturing the pattern. Randomizing Key Layouts: Employing dynamic key layouts that change positions every time the interface is used, making it impossible to correlate heat patterns with static input positions. === Technological Intervention on Thermal Cameras === Another avenue for mitigation is to address the issue at the source by modifying thermal cameras. Proposals have been made to develop thermal cameras that can automatically detect vulnerable interfaces such as keyboards or keypads. When these interfaces are detected within the camera's field of view, the camera would be programmed to prevent the user from recording images of them. This solution, however, would require widespread adoption by thermal camera manufacturers. Additionally, the approach is particularly viable for thermal cameras connected to a computing device, such as a smartphone, which can process the images in real time. Many affordable thermal cameras are standalone and do not have connectivity or processing capabilities. However, thermal cameras designed for connection to mobile devices can utilize the smartphone's processing power, making this mitigation approach feasible for such devices.

    Read more →
  • Google Vids

    Google Vids

    Google Vids (not to be confused with Google Video) is an online timeline-based video editing application included as part of the Google Workspace suite. It is designed to help users create informational videos for work-related purposes. The app uses Google's Gemini technology to enable users to create video storyboards manually or with AI assistance using simple prompts. Features include uploading media, choosing stock videos, images, background music, and a voiceover feature with script generation using AI. The app is currently in testing with select Google Workspace Labs users. Like Kapwing and Capcut, Google Vids is primarily for creating work-related content like sales training, onboarding videos, vendor outreach, and project updates. It offers various styles and templates, collaborative features, and is not limited to videos without the short integration at the moment. Google Vids was announced on April 9, 2024. In September 2025, Google began to roll out a basic version of the application to Google Workspace users.

    Read more →
  • Ordered dithering

    Ordered dithering

    Ordered dithering is any image dithering algorithm which uses a pre-set threshold map tiled across an image. It is commonly used to display a continuous image on a display of smaller color depth. For example, Microsoft Windows uses it in 16-color graphics modes. With the most common "Bayer" threshold map, the algorithm is characterized by noticeable crosshatch patterns in the result. == Threshold map == The algorithm reduces the number of colors by applying a threshold map M to the pixels displayed, causing some pixels to change color, depending on the distance of the original color from the available color entries in the reduced palette. The first threshold maps were designed by hand to minimise the perceptual difference between a grayscale image and its two-bit quantisation for up to a 4x4 matrix. An optimal threshold matrix is one that for any possible quantisation of color has the minimum possible texture so that the greatest impression of the underlying feature comes from the image being quantised. It can be proven that for matrices whose side length is a power of two there is an optimal threshold matrix. The map may be rotated or mirrored without affecting the effectiveness of the algorithm. This threshold map (for sides with length as power of two) is also known as a Bayer matrix or, when unscaled, an index matrix. For threshold maps whose dimensions are a power of two, the map can be generated recursively via: M 2 n = 1 ( 2 n ) 2 [ 4 M n 4 M n + 2 J n 4 M n + 3 J n 4 M n + J n ] = J 2 ⊗ M n + 1 n 2 M 2 ⊗ J n , {\displaystyle \mathbf {M} _{2n}={\frac {1}{(2n)^{2}}}{\begin{bmatrix}4\mathbf {M} _{n}&4\mathbf {M} _{n}+2\mathbf {J} _{n}\\4\mathbf {M} _{n}+3\mathbf {J} _{n}&4\mathbf {M} _{n}+\mathbf {J} _{n}\end{bmatrix}}=\mathbf {J} _{2}\otimes \mathbf {M} _{n}+{\frac {1}{n^{2}}}\mathbf {M} _{2}\otimes \mathbf {J} _{n},} where J n {\displaystyle \mathbf {J} _{n}} are n × n {\displaystyle n\times n} matrices of ones and ⊗ {\displaystyle \otimes } is the Kronecker product. While the metric for texture that Bayer proposed could be used to find optimal matrices for sizes that are not a power of two, such matrices are uncommon as no simple formula for finding them exists, and relatively small matrix sizes frequently give excellent practical results (especially when combined with other modifications to the dithering algorithm). This function can also be expressed using only bit arithmetic: M(i, j) = bit_reverse(bit_interleave(bitwise_xor(i, j), i)) / n ^ 2 == Pre-calculated threshold maps == Rather than storing the threshold map as a matrix of n {\displaystyle n} × n {\displaystyle n} integers from 0 to n 2 {\displaystyle n^{2}} , depending on the exact hardware used to perform the dithering, it may be beneficial to pre-calculate the thresholds of the map into a floating point format, rather than the traditional integer matrix format shown above. For this, the following formula can be used: Mpre(i,j) = Mint(i,j) / n^2 This generates a standard threshold matrix. for the 2×2 map: this creates the pre-calculated map: Additionally, normalizing the values to average out their sum to 0 (as done in the dithering algorithm shown below) can be done during pre-processing as well by subtracting 1⁄2 of the largest value from every value: Mpre(i,j) = Mint(i,j) / n^2 – 0.5 maxValue creating the pre-calculated map: == Algorithm == The ordered dithering algorithm renders the image normally, but for each pixel, it offsets its color value with a corresponding value from the threshold map according to its location, causing the pixel's value to be quantized to a different color if it exceeds the threshold. For most dithering purposes, it is sufficient to simply add the threshold value to every pixel (without performing normalization by subtracting 1⁄2), or equivalently, to compare the pixel's value to the threshold: if the brightness value of a pixel is less than the number in the corresponding cell of the matrix, plot that pixel black, otherwise, plot it white. This lack of normalization slightly increases the average brightness of the image, and causes almost-white pixels to not be dithered. This is not a problem when using a gray scale palette (or any palette where the relative color distances are (nearly) constant), and it is often even desired, since the human eye perceives differences in darker colors more accurately than lighter ones, however, it produces incorrect results especially when using a small or arbitrary palette, so proper normalization should be preferred. In other words, the algorithm performs the following transformation on each color c of every pixel: c ′ = n e a r e s t _ p a l e t t e _ c o l o r ( c + r × ( M ( x mod n , y mod n ) − 1 / 2 ) ) {\displaystyle c'=\mathrm {nearest\_palette\_color} {\mathopen {}}\left(c+r\times \left(M(x{\bmod {n}},y{\bmod {n}})-1/2\right){\mathclose {}}\right)} where M(i, j) is the threshold map on the i-th row and j-th column, c′ is the transformed color, and r is the amount of spread in color space. Assuming an RGB palette with 23N evenly distanced colors where each color (a triple of red, green and blue values) is represented by an octet from 0 to 255, one would typically choose r ≈ 255 N {\textstyle r\approx {\frac {255}{N}}} . (1⁄2 is again the normalizing term.) Because the algorithm operates on single pixels and has no conditional statements, it is very fast and suitable for real-time transformations. Additionally, because the location of the dithering patterns always stays the same relative to the display frame, it is less prone to jitter than error-diffusion methods, making it suitable for animations. Because the patterns are more repetitive than error-diffusion method, an image with ordered dithering compresses better. Ordered dithering is more suitable for line-art graphics as it will result in straighter lines and fewer anomalies. The values read from the threshold map should preferably scale into the same range as the minimal difference between distinct colors in the target palette. Equivalently, the size of the map selected should be equal to or larger than the ratio of source colors to target colors. For example, when quantizing a 24 bpp image to 15 bpp (256 colors per channel to 32 colors per channel), the smallest map one would choose would be 4×2, for the ratio of 8 (256:32). This allows expressing each distinct tone of the input with different dithering patterns. === A variable palette: pattern dithering === == Non-Bayer approaches == The above thresholding matrix approach describes the Bayer family of ordered dithering algorithms. A number of other algorithms are also known; they generally involve changes in the threshold matrix, which changes the distribution of the "noise" introduced by all kinds of dithering (the difference between the original image and the dithered image). === Halftone === Halftone dithering performs a form of clustered dithering, creating a look similar to halftone patterns, using a specially crafted matrix. === Void and cluster === The Void and cluster algorithm uses a pre-generated blue noise as the matrix for the dithering process. The blue noise matrix keeps the Bayer's good high frequency content, but with a more uniform coverage of all the frequencies involved shows a much lower amount of patterning. The "voids-and-cluster" method gets its name from the matrix generation procedure, where a black image with randomly initialized white pixels is gaussian-blurred to find the brightest and darkest parts, corresponding to voids and clusters. After a few swaps have evenly distributed the bright and dark parts, the pixels are numbered by importance. It takes significant computational resources to generate the blue noise matrix: on a modern computer a 64×64 matrix requires a couple seconds using the original algorithm. This algorithm can be extended to make animated dither masks which also consider the axis of time. This is done by running the algorithm in three dimensions and using a kernel which is a product of a two-dimensional gaussian kernel on the XY plane, and a one-dimensional Gaussian kernel on the Z axis. === Simulated Annealing === Simulated annealing can generate dither masks by starting with a flat histogram and swapping values to optimize a loss function. The loss function controls the spectral properties of the mask, allowing it to make blue noise or noise patterns meant to be filtered by specific filters. The algorithm can also be extended over time for animated dither masks with chosen temporal properties.

    Read more →