Interactive machine translation

Interactive machine translation

Interactive machine translation (IMT), is a specific sub-field of computer-aided translation. Under this translation paradigm, the computer software that assists the human translator attempts to predict the text the user is going to input by taking into account all the information it has available. Whenever such prediction is wrong and the user provides feedback to the system, a new prediction is performed considering the new information available. Such process is repeated until the translation provided matches the user's expectations. Interactive machine translation is specially interesting when translating texts in domains where it is not admissible to output a translation containing errors, hence requiring a human user to amend the translations provided by the system. In such cases, interactive machine translation has been proved to provide benefit to potential users. Nevertheless, there are few commercial software that implements interactive machine translation and work done in the field is mostly restrained to academic research. == History == Historically, interactive machine translation is born as an evolution of the computer-aided translation paradigm, where the human translator and the machine translation system were intended to work as a tandem. This first work was extended within the TransType research project, funded by the Canadian government. In this project, the human interaction was aimed towards producing the target text for the first time by embedding data-driven machine translation techniques within the interactive translation environment with the goal of achieving the best of both actors: the efficiency of the automatic system and the reliability of human translators. Later, a larger-scale research project, TransType2, funded by the European Commission extended such work by analyzing the incorporation of a complete machine translation system into the process, with the goal of producing a complete translation hypothesis, which the human user is allowed to amend or accept. If the user decides to amend the hypothesis, the system then attempts to make the best use of such feedback in order to produce a new translation hypothesis that takes into account the modifications introduced by the user. More recently, CASMACAT, also funded by the European Commission, aimed at developing novel types of assistance to human translators and integrated them into a new workbench, consisting of an editor, a server, and analysis and visualisation tools. The workbench was designed in a modular fashion and can be combined with existing computer aided translation tools. Furthermore, the CASMACAT workbench can learn from the interaction with the human translator by updating and adapting its models instantly based on the translation choices of the user. Recent work on involving an extensive evaluation with human users revealed the fact that interactive machine translation may even be used by users that do not speak the source language in order to achieve near professional translation quality. Moreover, it also elucidated the fact that an interactive scenario is more beneficial than a classic post-edition scenario. The previously described approaches rely on a tightly coupled underlying corpus-based machine translation system (usually, a Statistical machine translation system) that is used as a glass box, therefore inheriting the shortcomings of the translation systems and limiting the usage of interactive machine translation for some scenarios. For this reason, an approach that uses any kind of bilingual resource (not limited to machine translation) as a black-box to provide interactive machine translation was developed. This approach is not able to extract as much information from the bilingual resources used, due to the black-box nature of the interaction, but can use any resource available to the user. Forecat is a black-box interactive machine translation implementation that is available both as a web application (that includes a webpage and a web services interface) and as a plugin for OmegaT (Forecat-OmegaT). == Process == The interactive machine translation process starts with the system suggesting a translation hypothesis to the user. Then, the user may accept the complete sentence as correct, or may modify it if he considers there is some error. Typically, when modifying a given word, it is assumed that the prefix until that word is correct, leading to a left-to-right interaction scheme. Once the user has changed the word considered incorrect, the system then proposes a new suffix, i.e. the remainder of the sentence. Such process continues until the translation provided satisfies the user. Although explained at the word level, the previous process may also be implemented at the character level, and hence the system provides a suffix whenever the human translator types in a single character. In addition, there is ongoing effort towards changing the typical left-to-right interaction scheme in order to make human-machine interaction easier. A similar approach is used in the Caitra translation tool. == Evaluation == Evaluation is a difficult issue in interactive machine translation. Ideally, evaluation should take place in experiments involving human users. However, given the high monetary cost this would imply, this is seldom the case. Moreover, even when considering human translators in order to perform a true evaluation of interactive machine translation techniques, it is not clear what should be measured in such experiments, since there are many different variables that should be taken into account and cannot be controlled, as is for instance the time the user takes in order to get used to the process. In the CASMACAT project, some field trials have been carried out to study some of these variables. For quick evaluations in laboratory conditions, interactive machine translation is measured by using the key stroke ratio or the word stroke ratio. Such criteria attempt to measure how many key-strokes or words did the user need to introduce before producing the final translated document. == Differences with classical computer-aided translation == Although interactive machine translation is a sub-field of computer-aided translation, the main attractive of the former with respect to the latter is the interactivity. In classical computer-aided translation, the translation system may suggest one translation hypothesis in the best case, and then the user is required to post-edit such hypothesis. In contrast, in interactive machine translation the system produces a new translation hypothesis each time the user interacts with the system, i.e. after each word (or letter) has been introduced.

Image tracing

In computer graphics, image tracing, raster-to-vector conversion or raster vectorization is the conversion of raster graphics into vector graphics. == Background == An image does not have any structure: it is just a collection of marks on paper, grains in film, or pixels in a bitmap. While such an image is useful, it has some limits. If the image is magnified enough, its artifacts appear. The halftone dots, film grains, and pixels become apparent. Images of sharp edges become fuzzy or jagged. See, for example, pixelation. Ideally, a vector image does not have the same problem. Edges and filled areas are represented as mathematical curves or gradients, and they can be magnified arbitrarily (though of course the final image must also be rasterized in to be rendered, and its quality depends on the quality of the rasterization algorithm for the given inputs). The task in vectorization is to convert a two-dimensional image into a two-dimensional vector representation of the image. It is not examining the image and attempting to recognize or extract a three-dimensional model that may be depicted; i.e. it is not a vision system. For most applications, vectorization also does not involve optical character recognition; characters are treated as lines, curves, or filled objects without attaching any significance to them. In vectorization, the shape of the character is preserved, so artistic embellishments remain. Vectorization is the inverse operation corresponding to rasterization, as integration is to differentiation. And, just as with these other operations, while rasterization is fairly straightforward and algorithmic, vectorization involves the reconstruction of lost information and therefore requires heuristic methods. Synthetic images such as maps, cartoons, logos, clip art, and technical drawings are suitable for vectorization. Those images could have been originally made as vector images because they are based on geometric shapes or drawn with simple curves. Continuous tone photographs (such as live portraits) are not good candidates for vectorization. The input to vectorization is an image, but an image may come in many forms such as a photograph, a drawing on paper, or one of several raster file formats. Programs that do raster-to-vector conversion may accept bitmap formats such as TIFF, BMP and PNG. The output is a vector file format. Common vector formats are SVG, DXF, EPS, EMF and AI. Vectorization can be used to update images or recover work. Personal computers often come with a simple paint program that produces a bitmap output file. These programs allow users to make simple illustrations by adding text, drawing outlines, and filling outlines with a specific color. Only the results of these operations (the pixels) are saved in the resulting bitmap; the drawing and filling operations are discarded. Vectorization can be used to recapture some of the information that was lost. Vectorization is also used to recover information that was originally in a vector format but has been lost or has become unavailable. A company may have commissioned a logo from a graphic arts firm. Although the graphics firm used a vector format, the client company may not have received a copy of that format. The company may then acquire a vector format by scanning and vectorizing a paper copy of the logo. == Process == Vectorization starts with an image. === Manual === The image can be vectorized manually. A person could look at the image, make some measurements, and then write the output file by hand. That was the case for the vectorization of a technical illustration about neutrinos. The illustration has a few geometric shapes and a lot of text; it was relatively easy to convert the shapes, and the SVG vector format allows the text (even subscripts and superscripts) to be entered easily. The original image did not have any curves (except for the text), so the conversion is straightforward. Curves make the conversion more complicated. Manual vectorization of complicated shapes can be facilitated by the tracing function built into some vector graphics editing programs. If the image is not yet in machine readable form, then it has to be scanned into a usable file format. Once there is a machine-readable bitmap, the image can be imported into a graphics editing program (such as Adobe Illustrator, CorelDRAW, or Inkscape). Then a person can manually trace the elements of the image using the program's editing features. Curves in the original image can be approximated with lines, arcs, and Bézier curves. An illustration program allows spline knots to be adjusted for a close fit. Manual vectorization is possible, but it can be tedious. Although graphics drawing programs have been around for a long time, artists may find the freehand drawing facilities awkward even when a drawing tablet is used. Instead of using a program, Pepper recommends making an initial sketch on paper. Instead of scanning the sketch and tracing it freehand in the computer, Pepper states: "Those proficient with a graphic tablet and stylus could make the following changes directly in CorelDRAW by using a scan of the sketch as an underlay and drawing over it. I prefer to use pen and ink, and a light table"; most of the final image was traced by hand in ink. Later the line-drawing image was scanned at 600 dpi, cleaned up in a paint program, and then automatically traced with a program. Once the black and white image was in the graphics program, some other elements were added and the figure was colored. Similarly, Ploch recreated a design from a digital photograph. The JPEG was imported and some "basic shapes" were traced by hand and colored in the graphics drawing program; more complex shapes were handled differently. Ploch used a bitmap editor to remove the background and crop the more complex image components. He then printed the image and traced it by hand onto tracing paper to get a clean black and white line drawing. That drawing was scanned and then vectorized with a program. === Automatic === Some programs automate the vectorization process. Example programs are Adobe Illustrator, Inkscape, Corel's PowerTRACE, and Potrace. Some of these programs have a command line interface while others are interactive that allow the user to adjust the conversion settings and view the result. Adobe Streamline is not only an interactive program, but it also allows a user to manually edit the input bitmap and the output curves. Corel's PowerTRACE is accessed through CorelDRAW; CorelDRAW can be used to modify the input bitmap and edit the output curves. Adobe Illustrator has a facility to trace individual curves. Automated programs can have mixed results. A program (PowerTRACE) was used to convert a PNG map to SVG. The program did a good job on the map boundaries (the most tedious task in the tracing) and the settings dropped out all the text (small objects). The text was manually re-inserted. Other conversions may not go as well. The results depend on having high-quality scans, reasonable settings, and good algorithms. Scanned images often have a lot of noise, which can require additional work to clean up. == Options == There are many different image styles and possibilities, and no single vectorization method works well on all images. Consequently, vectorization programs have many options that influence the result. One issue is what the predominant shapes are. If the image is of a fill-in form, then it will probably have just vertical and horizontal lines of a constant width. The program's vectorization should take that into account. On the other hand, a CAD drawing may have lines at any angle, there may be curved lines, and there may be several line weights (thick for objects and thin for dimension lines). Instead of (or in addition to) curves, the image may contain outlines filled with the same color. Adobe Streamline allows users to select a combination of line recognition (horizontal and vertical lines), centerline recognition, or outline recognition. Streamline also allows small outline shapes to be thrown out; the notion is such small shapes are noise. The user may set the noise level between 0 and 1000; an outline that has fewer pixels than that setting is discarded. Another issue is the number of colors in the image. Even images that were created as black on white drawings may end up with many shades of gray. Some line-drawing routines employ anti-aliasing; a pixel completely covered by the line will be black, but a pixel that is only partially covered will be gray. If the original image is on paper and is scanned, there is a similar result: edge pixels will be gray. Sometimes images are compressed (e.g., JPEG images), and the compression will introduce gray levels. Many of the vectorization programs will group same-color pixels into lines, curves, or outlined shapes. If each possible color is grouped into its object, there can be an enormous number of objects. Instead, the user is asked to s

Blocking of Twitter in Nigeria

Twitter was blocked in Nigeria from 5 June 2021 to 13 January 2022. The government imposed a ban on the social network after it deleted tweets made by, and temporarily suspended, the Nigerian president Muhammadu Buhari, warning the southeastern people of Nigeria, predominantly Igbo people, of a potential repeat of the 1967 Nigerian Civil War due to the ongoing insurgency in Southeastern Nigeria. The Nigerian government claimed that the deletion of the president's tweets factored into their decision, but it was ultimately based on "a litany of problems with the social media platform in Nigeria, where misinformation and fake news spread through it have had real world violent consequences", citing the persistent use of the platform for activities that are capable of undermining Nigeria's corporate existence. In January 2022, Nigeria lifted its blocking of Twitter after the platform agreed to establish a legal entity within the country sometime in the first quarter of 2022. == Background == On 1 June 2021, Nigerian President Muhammadu Buhari posted a tweet threatening a crackdown on regional separatists "in the language they understand". The next day, Twitter deleted the tweet, claiming it was in violation of Twitter rules, but gave no further details. Nigeria's Information Minister Lai Mohammed said that Twitter's actions were part of an unfair double standard, as Twitter had not banned incitement tweets from other groups. During the Nigerian Civil War a majority of deaths resulted from the blockade of Biafra which caused the deaths of millions of civilians from starvation, a fact that was not alluded to in the tweet. The Nigerian government has long held concerns over the use of Twitter in the country. The ongoing local End SARS protest began on Twitter and got amplified in 2020 when it had 48 million tweets in ten days. Buhari's government floated the idea of social media regulation on different occasions prior to banning Twitter. Attempts to pass an anti-social media bill in the past have failed majorly due to massive outcry on Twitter. Days before the ban, the country's minister of information called Twitter's activities in Nigeria suspicious, citing its influence on the End SARS protests. == Aftermath == Three days after Twitter was suspended, it was reported that the move had cost the country over 6 billion naira and would also contribute to the worsening unemployment in the country. ExpressVPN reported an over 200 percent increase in web traffic and searches for VPN spiked across the country. In response, Nigeria's Minister of Justice and Attorney General of the Federation Abubakar Malami at first openly threatened to prosecute citizens who bypass the ban using a VPN but then denied saying so after a screenshot of a Twitter deactivation notification he shared on Facebook showed a VPN logo. Nigeria's cultural minister Lai Mohammed stated the ban would be lifted once Twitter submitted to locally licensing, registration and conditions. "It will be licensed by the broadcasting commission, and must agree not to allow its platform to be used by those who are promoting activities that are inimical to the corporate existence of Nigeria." In late June 2021, Twitter announced it would enter talks with the Nigerian government over the platform's suspension. The talks began in July 2021. On 15 September 2021, Mohammed said the Nigerian government will lift the ban on Twitter in a "few days." The Minister said Twitter gave a progress report of their talks with them, adding that it has been productive and quite respectful. On 1 October 2021, President Muhammadu Buhari in his Independence Day broadcast said Twitter must meet the Nigerian government's five conditions before the suspension of the social media platform will be lifted. The conditions are: Respect for national security and cohesion; registration, physical presence and representation in Nigeria; fair taxation; dispute resolution; local content. == Reactions == The ban was condemned by Amnesty International, the British, Canadian and Swedish diplomatic missions to Nigeria, as well as the United States and the European Union in a joint statement. Two domestic organizations, the Socio-Economic Rights and Accountability Project (SERAP) and the Nigerian Bar Association, indicated intent to challenge the ban in court. Twitter itself called the ban "deeply concerning". Former U.S. President Donald Trump, who was permanently suspended from Twitter following the United States Capitol attack in January, praised the ban, stating "Congratulations to the country of Nigeria, who just banned Twitter because they banned their President", and also called on other countries to ban Twitter and Facebook due to "not allowing free and open speech." == Lifting of the ban == On 12 January 2022, the Nigerian Government lifted the ban after Twitter agreed to pay an "applicable tax" and establish "a legal entity in Nigeria during the first quarter of 2022".

Fifth Estate

The Fifth Estate is a socio-cultural reference to groupings of outlier viewpoints in contemporary society, and is most associated with bloggers, journalists publishing in non-mainstream media outlets, and online social networks. The "Fifth" Estate extends the sequence of the three classical estates (clergy (first), nobility (second), commoners (third)) and the preceding Fourth Estate, essentially the common press. The use of "fifth estate" dates to the 1960s counterculture, and in particular the influential Fifth Estate, an underground newspaper first published in Detroit in 1965. Web-based technologies have enhanced the scope and power of the Fifth Estate far beyond the modest and boutique conditions of its beginnings. Nimmo and Combs asserted in 1992 that political pundits constitute a Fifth Estate. Media researcher Stephen D. Cooper argued in 2006 that bloggers are the Fifth Estate. In 2009, William Dutton argued that the Fifth Estate is not just the blogging community, nor an extension of the media, but "networked individuals" enabled by the Internet, e.g. social media, in ways that can hold the other estates accountable.

MADI

Multichannel Audio Digital Interface (MADI) standardized as AES10 by the Audio Engineering Society (AES) defines the data format and electrical characteristics of an interface that carries multiple channels of digital audio. The AES first documented the MADI standard in AES10-1991 and updated it in AES10-2003 and AES10-2008. The MADI standard includes a bit-level description and has features in common with the two-channel AES3 interface. MADI supports serial digital transmission over coaxial cable or fibre-optic lines of 28, 56, 32, or 64 channels; and sampling rates to 96 kHz and beyond with an audio bit depth of up to 24 bits per channel. Like AES3 and ADAT Lightpipe, it is a unidirectional interface from one sender to one receiver. == Development and applications == MADI was developed by AMS Neve, Solid State Logic, Sony and Mitsubishi and is widely used in the audio industry, especially in the professional audio sector. It provides advantages over other audio digital interface protocols and standards such as AES3, ADAT Lightpipe, TDIF (Tascam Digital Interface), and S/PDIF (Sony/Philips Digital Interface). These advantages include: Support for a greater number of channels per line Use of coaxial and optical fiber media that support transmission of audio signals over 100 meters, up to 3000 meters over multi-mode and 40,000 meters over single-mode optical fiber The original specification (AES10-1991) defined the MADI link as a 56-channel transport for linking large-format mixing consoles to digital multitrack recording devices. Large broadcast studios also adopted it for routing multi-channel audio throughout their facilities. The 2003 revision (AES10-2003) adds a 64-channel capability by removing varispeed operation and supports 96 kHz sampling frequency with reduced channel capacity. The latest AES10-2008 standard includes minor clarifications and updates to correspond to the current AES3 standard. Audio over Ethernet of various types is the primary alternative to MADI for transport of many channels of professional digital audio. == Transmission format == MADI links use a transmission format similar to Fiber Distributed Data Interface (FDDI) networking. Since MADI is most often transmitted on copper links via 75-ohm coaxial cables, it more closely compares to the FDDI specification for copper-based links, called CDDI. AES10-2003 recommends using BNC connectors with coaxial cables and SC connectors with optic fibers. MADI over fibre can support a range of up to 2 km. The basic data rate is 100 Mbit/s of data using 4B5B encoding to produce a 125 MHz physical baud rate. Unlike AES3, this clock is not synchronized to the audio sample rate, and the audio data payload is padded using JK sync symbols. Sync symbols may be inserted at any subframe boundary, and must occur at least once per frame. Though the standard disassociates the transmission clock from the audio sample rate, and thus requires a separate word clock connection to maintain synchronization, some vendors do give the option of locking to parts of the transmission timing information for purposes of deriving a word clock. The audio data is almost identical to the AES3 payload, though with more channels. Rather than letters, MADI assigns channel numbers from 0–63. Frame synchronization is provided by sync symbols outside the data itself, rather than an embedded preamble sequence, and the first four time slots of each sub-channel are encoded as normal data, used for sub-channel identification: Bit 0: Set to 1 to mark channel 0, the first channel in each frame Bit 1: Set to 1 to indicate that this channel is active (contains interesting data) Bit 2: notA/B channel marker, used to mark left (0) and right (1) channels. Generally, even channels are A and odd channels are B. Bit 3: Set to 1 to mark the beginning of a 192-sample data block == Sampling frequency == The original AES10-1991 specification allowed 56 channels at sample rates from 32 to 48 kHz with an additional vari-speed range of ± 12.5%. This leads to a total range of 28 to 54 kHz. At the highest frequency, this produces a total of 56 × 32 × 54 = 96768 kbit/s, leaving 3.232% of the channel for synchronization marks and transmit clock error. The 2003 revision specifies different relations between sampling frequency and number of channels. 32 kHz to 48 kHz ± 12.5%, 56 channels; 32 kHz to 48 kHz nominal, 64 channels; 64 kHz to 96 kHz ± 12.5%, 28 channels. With a 48 kHz sampling frequency, 64 channels take 64 × 32 × 48000 = 98.304 Mbit/s. Adding the minimum 8 × 58 kbit/s of framing produces 98688 bit/s, leaving 1.312% free for timing variation and other overhead. Both versions of the standard accommodate higher sampling frequencies (for example, 96 kHz or 192 kHz) by using two or more channels per audio sample on the link.

Fillrate

In computer graphics, a video card's pixel fillrate refers to the number of pixels that can be rendered on the screen and written to video memory in one second. Pixel fillrates are given in megapixels per second or in gigapixels per second (in the case of newer cards), and are obtained by multiplying the number of render output units (ROPs) by the clock frequency of the graphics processing unit (GPU) of a video card. A similar concept, texture fillrate, refers to the number of texture map elements (texels) the GPU can map to pixels in one second. Texture fillrate is obtained by multiplying the number of texture mapping units (TMUs) by the clock frequency of the GPU. Texture fillrates are given in mega or gigatexels per second. However, there is no full agreement on how to calculate and report fillrates. Another possible method is to multiply the number of pixel pipelines by the GPU's clock frequency. The results of these multiplications correspond to a theoretical number. The actual fillrate depends on many other factors. In the past, the fillrate has been used as an indicator of performance by video card manufacturers such as ATI and NVIDIA, however, the importance of the fillrate as a measurement of performance has declined as the bottleneck in graphics applications has shifted. For example, today, the number and speed of unified shader processing units has gained attention. Although fillrate doesn't provide a substantial bottleneck in games, it can still provide a bottleneck for certain parts of the game, for example applying a gaussian blur can be bottlenecked by fillrate. Scene complexity can be increased by overdrawing, which happens when an object is drawn to the frame buffer, and another object (such as a wall) is then drawn on top of it, covering it up. The time spent drawing the first object is thus wasted because it is not visible. When a sequence of scenes is extremely complex (many pixels have to be drawn for each scene), the frame rate for the sequence may drop. When designing graphics intensive applications, one can determine whether the application is fillrate-limited (or shader limited) by seeing if the frame rate increases dramatically when the application runs at a lower resolution or in a smaller window. Although this is not a full-proof method, modern videogame engines can dynamically reduce the level-of-detail required and thereby reducing fillrate-limited applications. The best way to find fillrate bottlenecks is to use GPU vendor software like NVIDIA Nsight Graphics, AMD Radeon GPU Profile and the Intel Graphics Performance Analyzers.

Mike Little

Mike Little (born 12 May 1962) is an English web developer and writer. He is the co-founder of the free and open source web publishing software WordPress. == Biography == Mike Little was born in Manchester, England in 1962 to a Nigerian father, who was a mathematics lecturer and musician, and an English mother who worked as a primary school teacher. Little was placed into foster care when he was four months of age, and was later adopted by the same family. He grew up on a council estate in Brinnington, Stockport, and was educated at Stockport School. In 2003, Little and Matt Mullenweg started working on a project in which they built on b2/cafelog and later named it WordPress, releasing the first version on 27 May 2003. Little states that, despite not being invited to join his co-founder's for-profit business Automattic, he and Mullenweg remain on good terms. He clarified: "I don’t want it to sound like he cheated me out of something or ripped me off in some way. He didn’t." In June 2013, Little was awarded the SAScon's "Outstanding Contribution to Digital" award for his part in co-founding and developing WordPress. Little has been described as "modest" and living in "virtual anonymity". He has one daughter. He identifies as a follower of Stoicism and a humanist, and in 2021, he became a patron of charity Humanists UK.