Web content development

Web content development

Web content development is the process of researching, writing, gathering, organizing, and editing information for publication on websites. Website content may consist of prose, graphics, pictures, recordings, movies, or other digital assets that could be distributed by a hypertext transfer protocol server, and viewed by a web browser. == Web developers and content developers == When the World Wide Web began, web developers either developed online content themselves, or modified existing documents and coded them into hypertext markup language (HTML). In time, the field of website development came to encompass many technologies, so it became difficult for website developers to maintain so many different skills. Content developers are specialized website developers who have content generation skills such as graphic design, multimedia development, professional writing, and documentation. They can integrate content into new or existing websites without using information technology skills such as script language programming and database programming. Content developers or technical content developers can also be technical writers who produce technical documentation that helps people understand and use a product or service. This documentation includes online help, manuals, white papers, design specifications, developer guides, deployment guides, release notes, etc. == Search engine optimization == Content developers may also be search engine optimization specialists, or internet marketing professionals. High quality, unique content is what search engines are looking for. Content development specialists, therefore, have a very important role to play in the search engine optimization process. One issue currently plaguing the world of web content development is keyword-stuffed content which are prepared solely for the purpose of manipulating search engine rankings. The effect is that content is written to appeal to search engine (algorithms) rather than human readers. Search engine optimization specialists commonly submit content to article directories to build their website's authority on any given topic. Most article directories allow visitors to republish submitted content with the agreement that all links are maintained. This has become a method of search engine optimization for many websites today. If written according to SEO copywriting rules, the submitted content will bring benefits to the publisher (free SEO-friendly content for a webpage) as well as to the author (a hyperlink pointing to his/her website, placed on an SEO-friendly webpage). == New content types == Web content is no longer restricted to text. Search engines now index audio/visual media, including video, images, PDFs, and other elements of a web page. Website owners sometimes use content protection networks to scan for plagiarized content.

Inauthentic text

An inauthentic text is a computer-generated expository document meant to appear as genuine, but which is actually meaningless. Frequently they are created in order to be intermixed with genuine documents and thus manipulate the results of search engines, as with Spam blogs. They are also carried along in email in order to fool spam filters by giving the spam the superficial characteristics of legitimate text. Sometimes nonsensical documents are created with computer assistance for humorous effect, as with Dissociated press or Flarf poetry. They have also been used to challenge the veracity of a publication—MIT students submitted papers generated by a computer program called SCIgen to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low. With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics. Noam Chomsky coined the phrase "Colorless green ideas sleep furiously" giving an example of grammatically correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning. The first group to use the expression in this regard can be found below from Indiana University. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.

Residual neural network

A residual neural network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of that year. As a point of terminology, "residual connection" refers to the specific architectural motif of x ↦ f ( x ) + x {\displaystyle x\mapsto f(x)+x} , where f {\displaystyle f} is an arbitrary neural network module. The motif had been used previously (see §History for details). However, the publication of ResNet made it widely popular for feedforward networks, appearing in neural networks that are seemingly unrelated to ResNet. The residual connection stabilizes the training and convergence of deep neural networks with hundreds of layers, and is a common motif in deep neural networks, such as transformer models (e.g., BERT, and GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system. == Mathematics == === Residual connection === In a multilayer neural network model, consider a (non-residual) subnetwork with a certain number of stacked layers (e.g., 2 or 3). Let H ( x ; α ) {\displaystyle H(x;\alpha )} denote the subnetwork. Suppose H ∗ {\displaystyle H^{}} is the desired optimal output of this subnetwork. Residual learning simply adds x {\displaystyle x} directly to the output, such that the optimal learned output now becomes be H ∗ − x {\displaystyle H^{}-x} , which is interpreted as a "residual" with respect to x {\displaystyle x} . The operation of "adding x {\displaystyle x} " is implemented via a "skip connection" that performs an identity mapping to connect the input of the subnetwork with its output. This connection is referred to as a "residual connection" in later work. Let F ( x ; α ) = H ( x ; a ) + x {\displaystyle F(x;\alpha )=H(x;a)+x} . The function F {\displaystyle F} is often represented by matrix multiplication interlaced with activation functions and normalization operations (e.g., batch normalization or layer normalization). As a whole, one of these subnetworks is referred to as a "residual block". A deep residual network is constructed by simply stacking these blocks. Long short-term memory (LSTM) has a memory mechanism that serves as a residual connection. In an LSTM without a forget gate, an input x t {\displaystyle x_{t}} is processed by a function F {\displaystyle F} and added to a memory cell c t {\displaystyle c_{t}} , resulting in c t + 1 = c t + F ( x t ) {\displaystyle c_{t+1}=c_{t}+F(x_{t})} . An LSTM with a forget gate essentially functions as a highway network. To stabilize the variance of the layers' inputs, it is recommended to replace the residual connections x + f ( x ) {\displaystyle x+f(x)} with x / L + f ( x ) {\displaystyle x/L+f(x)} , where L {\displaystyle L} is the total number of residual layers. === Projection connection === If the function F {\displaystyle F} is of type F : R n → R m {\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{m}} where n ≠ m {\displaystyle n\neq m} , then F ( x ) + x {\displaystyle F(x)+x} is undefined. To handle this special case, a projection connection is used: y = F ( x ) + P ( x ) {\displaystyle y=F(x)+P(x)} where P {\displaystyle P} is typically a linear projection, defined by P ( x ) = M x {\displaystyle P(x)=Mx} where M {\displaystyle M} is a m × n {\displaystyle m\times n} matrix. The matrix is trained via backpropagation, as is any other parameter of the model. === Signal propagation === The introduction of identity mappings facilitates signal propagation in both forward and backward paths. ==== Forward propagation ==== If the output of the ℓ {\displaystyle \ell } -th residual block is the input to the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th residual block (assuming no activation function between blocks), then the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th input is: x ℓ + 1 = F ( x ℓ ) + x ℓ {\displaystyle x_{\ell +1}=F(x_{\ell })+x_{\ell }} Applying this formulation recursively, e.g.: x ℓ + 2 = F ( x ℓ + 1 ) + x ℓ + 1 = F ( x ℓ + 1 ) + F ( x ℓ ) + x ℓ {\displaystyle {\begin{aligned}x_{\ell +2}&=F(x_{\ell +1})+x_{\ell +1}\\&=F(x_{\ell +1})+F(x_{\ell })+x_{\ell }\end{aligned}}} yields the general relationship: x L = x ℓ + ∑ i = ℓ L − 1 F ( x i ) {\displaystyle x_{L}=x_{\ell }+\sum _{i=\ell }^{L-1}F(x_{i})} where L {\textstyle L} is the index of a residual block and ℓ {\textstyle \ell } is the index of some earlier block. This formulation suggests that there is always a signal that is directly sent from a shallower block ℓ {\textstyle \ell } to a deeper block L {\textstyle L} . ==== Backward propagation ==== The residual learning formulation provides the added benefit of mitigating the vanishing gradient problem to some extent. However, it is crucial to acknowledge that the vanishing gradient issue is not the root cause of the degradation problem, which is tackled through the use of normalization. To observe the effect of residual blocks on backpropagation, consider the partial derivative of a loss function E {\displaystyle {\mathcal {E}}} with respect to some residual block input x ℓ {\displaystyle x_{\ell }} . Using the equation above from forward propagation for a later residual block L > ℓ {\displaystyle L>\ell } : ∂ E ∂ x ℓ = ∂ E ∂ x L ∂ x L ∂ x ℓ = ∂ E ∂ x L ( 1 + ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) ) = ∂ E ∂ x L + ∂ E ∂ x L ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) {\displaystyle {\begin{aligned}{\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial x_{L}}{\partial x_{\ell }}}\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}\left(1+{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\right)\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}+{\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\end{aligned}}} This formulation suggests that the gradient computation of a shallower layer, ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} , always has a later term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} that is directly added. Even if the gradients of the F ( x i ) {\displaystyle F(x_{i})} terms are small, the total gradient ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} resists vanishing due to the added term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} . == Variants of residual blocks == === Basic block === A basic block is the simplest building block studied in the original ResNet. This block consists of two sequential 3x3 convolutional layers and a residual connection. The input and output dimensions of both layers are equal. === Bottleneck block === A bottleneck block consists of three sequential convolutional layers and a residual connection. The first layer in this block is a 1×1 convolution for dimension reduction (e.g., to 1/2 of the input dimension); the second layer performs a 3×3 convolution; the last layer is another 1×1 convolution for dimension restoration. The models of ResNet-50, ResNet-101, and ResNet-152 are all based on bottleneck blocks. === Pre-activation block === The pre-activation residual block applies activation functions before applying the residual function F {\displaystyle F} . Formally, the computation of a pre-activation residual block can be written as: x ℓ + 1 = F ( ϕ ( x ℓ ) ) + x ℓ {\displaystyle x_{\ell +1}=F(\phi (x_{\ell }))+x_{\ell }} where ϕ {\displaystyle \phi } can be any activation (e.g. ReLU) or normalization (e.g. LayerNorm) operation. This design reduces the number of non-identity mappings between residual blocks, and allows an identity mapping directly from the input to the output. This design was used to train models with 200 to over 1000 layers, and was found to consistently outperform variants where the residual path is not an identity function. The pre-activation ResNet with 200 layers took 3 weeks to train for ImageNet on 8 GPUs in 2016. Since GPT-2, transformer blocks have been mostly implemented as pre-activation blocks. This is often referred to as "pre-normalization" in the literature of transformer models. == Applications == Originally, ResNet was designed for computer vision. All transformer architectures include residual connections. Indeed, very deep transformers cannot be trained without them. The original ResNet paper made no claim on being inspired by biological systems. However, later research has related ResNet to biologically-plausible algorithms. A study published in Science in 2023 disclosed the complete connectome of an insect brain (specifically that of a fruit fly larva). This study discovered "multilayer shortcuts" that resemble the skip connections in artificial neural networks, including ResNets. == History == === Previous work === Residual connections were noticed in neu

Content determination

Content determination is the subtask of natural language generation (NLG) that involves deciding on the information to be communicated in a generated text. It is closely related to the task of document structuring. == Example == Consider an NLG system which summarises information about sick babies. Suppose this system has four pieces of information it can communicate The baby is being given morphine via an IV drop The baby's heart rate shows bradycardia's (temporary drops) The baby's temperature is normal The baby is crying Which of these bits of information should be included in the generated texts? == Issues == There are three general issues which almost always impact the content determination task, and can be illustrated with the above example. Perhaps the most fundamental issue is the communicative goal of the text, i.e. its purpose and reader. In the above example, for instance, a doctor who wants to make a decision about medical treatment would probably be most interested in the heart rate bradycardias, while a parent who wanted to know how her child was doing would probably be more interested in the fact that the baby was being given morphine and was crying. The second issue is the size and level of detail of the generated text. For instance, a short summary which was sent to a doctor as a 160 character SMS text message might only mention the heart rate bradycardias, while a longer summary which was printed out as a multipage document might also mention the fact that the baby is on a morphine IV. The final issue is how unusual and unexpected the information is. For example, neither doctors nor parents would place a high priority on being told that the baby's temperature was normal, if they expected this to be the case. Regardless, content determination is very important to users, indeed in many cases the quality of content determination is the most important factor (from the user's perspective) in determining the overall quality of the generated text. == Techniques == There are three basic approaches to document structuring: schemas (content templates), statistical approaches, and explicit reasoning. Schemas are templates which explicitly specify the content of a generated text (as well as document structuring information). Typically, they are constructed by manually analysing a corpus of human-written texts in the target genre, and extracting a content template from these texts. Schemas work well in practice in domains where content is somewhat standardised, but work less well in domains where content is more fluid (such as the medical example above). Statistical techniques use statistical corpus analysis techniques to automatically determine the content of the generated texts. Such work is in its infancy, and has mostly been applied to contexts where the communicative goal, reader, size, and level of detail are fixed. For example, generation of newswire summaries of sporting events. Explicit reasoning approaches have probably attracted the most attention from researchers. The basic idea is to use AI reasoning techniques (such as knowledge-based rules, planning, pattern detection, case-based reasoning, etc.) to examine the information available to be communicated (including how unusual/unexpected it is), the communicative goal and reader, and the characteristics of the generated text (including target size), and decide on the optimal content for the generated text. A very wide range of techniques has been explored, but there is no consensus as to which is most effective.

Zero-shot learning

Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception. == Background and history == The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification. The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning. The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09. This terminology was repeated later in another computer vision paper and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier. In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes. In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains, fine entity typing and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment and question answering. The original paper also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation. == Prerequisite information for zero-shot classes == Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types. Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak". These attributes are often organized in a structured compositional way, and taking that structure into account improves learning. While this approach was used mostly in computer vision, there are some examples for it also in natural language processing. Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class. Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. == Generalized zero-shot learning == The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include: a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision, or a soft probabilistic decision a generative module, which is trained to generate feature representation of the unseen classes—a standard classifier can then be trained on samples from all classes, seen and unseen. == Domains of application == Zero shot learning has been applied to the following fields: image classification semantic segmentation image generation object detection natural language processing computational biology abstract reasoning

Touch 'n Go eWallet

Touch 'n Go eWallet is a Malaysian digital wallet and online payment platform, established in Kuala Lumpur, Malaysia, in July 2017 as a joint venture between Touch 'n Go and Ant Financial. It allows users to make payments at over 280,000 merchant touch points via QR code, as well as perform peer-to-peer (P2P) money transfers. Since then, the e-wallet further diversified for users to pay for tolls via RFID or PayDirect, street parking and various online payment spanning e-hailing, car-sharing apps or taxis, various overhead bills; top-up for mobile prepaid or in-game currencies; purchases on e-commerce websites; food delivery; renewing motor insurance and other insurance/takaful plans; and even movie, bus, trains or airline tickets. == Background == Prior to the launch of the e-wallet service, Touch 'n Go provided stored-value physical all-in-one contactless card (namely Touch 'n Go cards or "TnG cards") that users can use to pay for toll fares, public transportation and parking lots as well as purchases in some retail stores. In 1999, Touch 'n Go also markets SmartTag devices that allow road users to pass through certain toll booths without the need to unwind the car window. The high entry cost of the device (around RM 100 each) also meant that only few can enjoy the seamless experience. In 2009, Touch 'n Go partnered with Maxis to launch FastTap, a new mobile payment service that utilised Near-Field Communication (NFC). Maxis customers can make payments by placing the phone near the card readers (that also supports physical bank cards and Touch ’N Go cards). However, the venture featured only one phone model, Nokia 6212, which greatly limited the public reach. In July 2012, Touch 'n Go announced another collaboration with CIMB and Maxis to create similar NFC-based online transaction service that runs on compatible smartphones. Touch 'n Go Wallet was launched in February 2017 as an QR code-based e-wallet application, to compete with Samsung Pay that utilizes NFC modules. In the controlled pilot test in Taman Tun Dr Ismail, the correspondents can experience basic functionalities (prepaid mobile service reload, bills payment, movie tickets and flight tickets purchase, transfer of money with another user, and payments at participating stores and restaurants). While the deployed version of the app was generally well-received, the existing process to transfer the balance to the physical TnG card stored value from the app garnered unanimous backlash. Test groups felt that the need to head to a self-service terminal named "Pick Up Device" in person within 24 hours for completion, along with the failure to do so (the balance would be credited back to the wallet after 24 hours), was not divulged clearly and also defeated the purpose of convenience, not to mention there were only 2 such terminals. The feature was eventually suspended. On 15 November 2017, Touch 'n Go was granted permission by the Central Bank of Malaysia to form a joint venture with Ant Financial, a Chinese-based financial company that operates Alipay. The partnership allowed the local e-wallet to learn from and build upon the operational model pioneered by Alipay. In June 2018, it was reported that Touch 'n Go was pilot testing the uses of the Touch 'n Go eWallet in Rapid Transit, as the ticketing system was enabled on the Kelana Jaya line in the Klang Valley. Pilot testing only applied to stations in Kelana Jaya, KL Gateway–Universiti, Kerinchi, KL Sentral, Dang Wangi, KLCC, and Ampang Park. The test was reported to be successful in February 2020 and was planned to be fully deployed on the LRT and MRT. Due to unforeseen circumstances, this feature did not come into fruition, the app merely adds in-app purchase of monthly concession cards called "My50". In August 2018, Touch 'n Go announced that selected drivers may experience first-hand a new RFID-based payment (later rebranded as "myRFID") that serves to replace SmartTag devices on closed toll roads with during pilot testing phase commencing on 3 September 2018. On 2 November 2018, participation in the ongoing pilot programme was expanded, allowing more drivers to sign up ahead of the public rollout of the RFID system. During the same period, Touch 'n Go has discontinued the sales of SmartTAG devices in favor of the RFID-based payment system. Initially, the installation of the RFID chip onto the car could only be done by Touch 'n Go staff at the RFID fitment centers, at no cost. As the pilot testing concluded on 15 February 2020, a self-installation kit are being offered to the public on Lazada and Shopee. Support for taxi-hailing mobile apps was added in November 2018 when Touch 'n Go partnered with EzCab and Public Cab, allowing users to make payments via QR code. This was later expanded to support MULA on 7 January 2020, and later MyCar on 4 April 2020. Touch 'n Go eWallet was also the first eWallet to convert Kuala Lumpur's most famous Ramadan bazaar in Kampong Bahru into "Kampong Kashless", a venue that can accept cashless QR payments. It welcomed more than 250,000 Malaysians including local celebrities and government officials. On 1 October 2019, some e-commerce websites owned by the Alibaba Group (TMall and Taobao) began to support Touch 'n Go eWallet payments, Lazada joined the list on 29 October 2019. Touch 'n Go eWallet was one of the three e-wallet services in Malaysia (the other being Boost and GrabPay) that was eligible for its users to receive an RM 30 credit in conjunction of E-Tunai Rakyat program under the Budget 2020 plan, that further normalizes adoption of cashless and mobile payment among Malaysians. Unlike Boost and GrabPay, whose P2P transfers were completely disabled until users have exhausted the RM 30 first, Touch 'n Go eWallet did not impose such measures. in 2020, Touch 'n Go eWallet joined DuitNow, an electronic transaction ecosystem in Malaysia which allows the funds from Touch 'n Go eWallet to be transferred to other competing services and vice versa, by implementing a standard DuitNow QR code deisgn. Japan become the first country outside Malaysia to support Touch 'n Go eWallet payment via Alipay Connect. During the COVID-19 pandemic and the enforcement of the movement control order, use of eWallets (including Touch 'n Go eWallet) increased tremendously among citizens due to its contactless nature of the payment and increased take-out orders at home; which in turn helped small and medium-sized enterprises to thrive. Touch 'n Go eWallet launched its loyalty programme – The Goal Hunter – in October 2020 where on monthly basis, users collect stamps by paying with the app in exchange for rewards that include lucky draws and other vouchers. == Services == Touch 'n Go eWallet app is available for download on both Google Play and Apple Appstore. It utilizes QR code technology for local in-store payments. The Touch 'n Go eWallet app also diversifies payment types, including but not limited to Utility bills Purchase of motor insurance policy Pay Later facility Prepaid reload and Postpaid payment to telecommunications companies loan repayments for courts, MBSJ payments, zakat and PTPTN payment for car parking P2P transfer airline ticket bookings; movie tickets from TGV Cinemas RFID refuelling at Shell stations (defunct after Shell launched its own payment app in 2024) User can reload the eWallet credit by setting up auto-reload, purchasing reload pins from convenience stores (such as 7-Eleven, KK Super Mart, MyNews, Family Mart etc.), reloading by FPX and credit/debit card. The PayDirect feature allows users to link their physical Touch 'n Go cards into the eWallet, where the toll fare can be debited from the eWallet balance when flashing the card near the sensor. In the circumstance of insufficient balance in the app, the toll fare will be deducted from the physical card's balance instead. This also conveniently allows users to view the card's remaining balance. Touch 'n Go eWallet is the first and only eWallet to offer a money-back guarantee when an unauthorised transaction is made on the user’s eWallet account, subject to Terms & Conditions. Payment via QR code scanning, including Touch 'n Go eWallet, becomes a norm in most of the shops/restaurants across Malaysia, including roadside hawkers/stall owners and automatic vending machines. The merchants usually display their owner's individual QR or Business account that they can apply for in-app. The popularity attributes to the low merchant onboarding cost (Unlike NFC payment and debit/credit card that requires purchase or rental of a payment terminal device at a yearly fee.) The app is also one of the few ewallet that supports bidirectional liquidity (alongside MAE developed by Maybank), where funds can be transferred two-way with bank accounts. This is not possible with the other major ewallets (GrabPay, Boost, ShopeePay etc.) where the money that is reloaded to the wallet cannot be transferred to another bank account, unless through manual req

Text normalization

Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.