AI Grammar Rephrase Online Free

AI Grammar Rephrase Online Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Video Super Resolution

    Video Super Resolution

    RTX Video Super Resolution (RTX VSR) is a video scaling feature by Nvidia. It was released on February 28, 2023. == History == The feature was first unveiled during CES 2023 as RTX Video Super Resolution. It uses the on-board Tensor Cores to upscale browser video content in real time. Video Super Resolution was initially only available on RTX 30 and 40 series GPUs, while support for 20 series GPUs was added afterwards; it is now available on all Nvidia RTX-branded GPUs. The feature supports input resolutions from 360p to 1440p and a max output of 4K and comes without support for HDR content although that could be likely added in the future. Nvidia released RTX Video Super Resolution 1.5 with improved video quality and RTX 20 series support on October 17, 2023. == Reception == According to ComputerBase, although "the algorithm is not yet working flawlessly", the feature is "overall recommendable".

    Read more →
  • Hapax legomenon

    Hapax legomenon

    In corpus linguistics, a hapax legomenon ( also or ; pl. hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is also sometimes used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "said once". The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (, , ) refer to double, triple, or quadruple occurrences, but are far less commonly used. Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus. Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on. == Significance == Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew; see § Hebrew) hapax legomena sometimes pose problems in translation. Hapax legomena also pose challenges in natural language processing. Some scholars consider Hapax legomena useful in determining the authorship of written works. P. N. Harrison, in The Problem of the Pastoral Epistles (1921) made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual. Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W. P. Workman found the following numbers of hapax legomena in each Pauline Epistle: At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others. To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right. Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right. Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work: text length: this directly affects the expected number and percentage of hapax legomena; the brevity of the Pastoral Epistles also makes any statistical analysis problematic. text topic: if the author writes on different subjects, of course many subject-specific words will occur only in limited contexts. text audience: if the author is writing to a peer rather than a student, or their spouse rather than their employer, again quite different vocabulary will appear. time: over the course of years, both the language and an author's knowledge and use of language will change. In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship; those who reject Pauline authorship of the Pastorals rely on other arguments. There are also subjective questions over whether two forms amount to "the same word": dog vs. dogs, clue vs. clueless, sign vs. signature; many other gray cases also arise. The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms. A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors often show similar values. In other words, hapax legomena are not a reliable indicator. Authorship studies now usually use a wide range of measures to look for patterns rather than relying upon single measurements. == Computer science == In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapax legomena. == Examples == The following are some examples of hapax legomena in languages or corpora. === Arabic === In the Qurʾān: The proper nouns Iram (Q 89:7, Iram of the Pillars), Bābil (Q 2:102, Babylon), Bakka(t) (Q 3:96, Bakkah), Jibt (Q 4:51), Ramaḍān (Q 2:185, Ramadan), ar-Rūm (Q 30:2, Byzantine Empire), Tasnīm (Q 83:27), Qurayš (Q 106:1, Quraysh), Majūs (Q 22:17, Magian/Zoroastrian), Mārūt (Q 2:102, Harut and Marut), Makka(t) (Q 48:24, Mecca), Nasr (Q 71:23), (Ḏū) an-Nūn (Q 21:87) and Hārūt (Q 2:102, Harut and Marut) occur only once. zanjabīl (زَنْجَبِيل – ginger) is a Qurʾānic hapax (Q 76:17). zamharīr (زَمْهَرِيرًۭ) is a Qurʾānic hapax (Q 76:13), usually glossed as referring to extreme cold. The epitheton ornans aṣ-ṣamad (الصَّمَد – the One besought) is a Qurʾānic hapax (Q 112:2). ṭūd (طُودْ - mountain) is a Qurʾānic hapax (Q 26:63). === Chinese and Japanese === Classical Chinese and Japanese literature contains many Chinese characters that feature only once in the corpus, and their meaning and pronunciation has often been lost. Known in Japanese as kogo (孤語), literally "lonely characters", these can be considered a type of hapax legomenon. For example, the Classic of Poetry (c. 1000 BC) uses the character 篪 exactly once in the verse 「伯氏吹塤, 仲氏吹篪」, and it was only through the discovery of a description by Guo Pu (276–324 AD) that the character could be associated with a specific type of ancient flute. === English === It is fairly common for authors to "coin" new words to convey a particular meaning or for the sake of entertainment, without any suggestion that they are "proper" words. For example, P.G. Wodehouse and Lewis Carroll frequently coined novel words. Indexy, below, appears to be an example of this. Flother, as a synonym for snowflake, is a hapax legomenon of written English found in a manuscript entitled The XI Pains of Hell (c. 1275). Honorificabilitudinitatibus is a hapax legomenon of Shakespeare's works, coming from Erasmus' Adagia Indexy, in Bram Stoker's Dracula, used as an adjective to describe a situational state with no other further use in the language: "If that man had been an ordinary lunatic I would have taken my chance of trusting him; but he seems so mixed up with the Count in an indexy kind of way that I am afraid of doing anything wrong by helping his fads." Manticratic, meaning "of the rule by the Prophet's family or clan", was apparently invented by T. E. Lawrence and appears once in Seven Pillars of Wisdom. Nortelrye, a word for "education", occurs only once in Chaucer's The Reeve's Tale. Sassigassity, perhaps with the meaning of "audacity", occurs only once in Dickens's short story "A Christmas Tree". Slæpwerigne, "sleep-weary", occurs exactly once in the Old English corpus, in the Exeter Book. There is debate over whether it means "weary with sleep" or "weary for sleep". === German === The name of the 9th-century poem Muspilli is a back-formation from "muspille", Old High German hapax legomenon of unclear meaning only found in this text (see Muspilli § Etymology for discussion). === Ancient Greek === According to classical scholar Clyde Pharr, "the Iliad has 1,097 hapax legomena, while the Odyssey has 868". Others have defined the term differently, however, and count as few as 303 in the Iliad and 191 in the Odyssey. panaōrios (παναώριος), ancient Greek for "very untimely", is one of many words that occur only once in the Iliad. The Greek New Testament contains 686 local hapax legomena, which are sometimes called "New Testament hapaxes". 62 of these occur in 1 Peter and 54 occur in 2 Peter

    Read more →
  • Noémie Elhadad

    Noémie Elhadad

    Noémie Elhadad is an American data scientist who is an associate professor of biomedical informatics at the Columbia University Vagelos College of Physicians and Surgeons. As of 2022, she serves as the chair of the Department of Biomedical Informatics. Her research considers machine learning in bioinformatics, natural language processing and medicine. == Early life and education == Elhadad studied computer software engineering at École nationale supérieure d'électronique, informatique, télécommunications, mathématique et mécanique de Bordeaux (ENSEIRB). She completed her doctoral research at Columbia University. She was based in the Department of Computer Science, where she developed patient-focused text summaries of clinical literature. == Research and career == Elhadad joined the faculty at the City College of New York. In 2007 she joined the Department of Biomedical Informatics at Columbia University. She was made Chair of the Health Analytics Center at the Columbia Data Science Institute in 2013. Her research considers how clinical data, electronic health records and patient-generated data can enhance access to information for researchers, patients and physicians. She developed an artificial intelligence tool that supported patients in the NewYork-Presbyterian Hospital. Elhadad is interested in using data to advance women's health. She led the Citizen Endo Project that looks to comprehensively describe how patients experience endometriosis. It was built using principles of citizen science, using patient testimonials from focus groups in New York City and data aggregation. She created the app, Phendo, which asks patients about their experience of the disease. The name Phendo is a portmanteau of phenotyping endometriosis. Elhadad was announced as chair of the Department of Biomedical Informatics in December 2022. == Selected publications == Caruana, Rich; Lou, Yin; Gehrke, Johannes; Koch, Paul; Sturm, Marc; Elhadad, Noemie (August 10, 2015). "Intelligible Models for HealthCare". Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM. pp. 1721–1730. doi:10.1145/2783258.2788613. ISBN 9781450336642. S2CID 14190268. Chaitanya Shivade; Preethi Raghavan; Eric Fosler-Lussier; Peter J Embi; Noemie Elhadad; Stephen B Johnson; Albert M Lai (November 7, 2013). "A review of approaches to identifying patient phenotype cohorts using electronic health records". Journal of the American Medical Informatics Association. 21 (2): 221–230. doi:10.1136/AMIAJNL-2013-001935. ISSN 1067-5027. PMC 3932460. PMID 24201027. Wikidata Q37598951. Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M (March 2014). "A review of approaches to identifying patient phenotype cohorts using electronic health records". Journal of the American Medical Informatics Association. 21 (2): 221–230. doi:10.1136/amiajnl-2013-001935. ISSN 1067-5027. PMC 3932460. PMID 24201027. == Personal life == Elhadad suffers from endometriosis.

    Read more →
  • Automatic number-plate recognition

    Automatic number-plate recognition

    Automatic number-plate recognition (ANPR; see also other names below) is a technology that uses optical character recognition on images to read vehicle registration plates to create vehicle location data. It can use existing closed-circuit television, road-rule enforcement cameras, or cameras specifically designed for the task. ANPR is used by police forces around the world for law enforcement purposes, including checking if a vehicle is registered or licensed. It is also used for electronic toll collection on pay-per-use roads and as a method of cataloguing the movements of traffic, for example by highways agencies. Automatic number-plate recognition can be used to store the images captured by the cameras as well as the text from the license plate, with some configurable to store a photograph of the driver. Systems commonly use infrared lighting to allow the camera to take the picture at any time of day or night. ANPR technology must take into account plate variations from place to place. Privacy issues have caused concerns about ANPR, such as government tracking citizens' movements, misidentification, high error rates, and increased government spending. Critics have described it as a form of mass surveillance. == Other names == ANPR is also known by various other terms: Automatic (or automated) license-plate recognition (ALPR) Automatic (or automated) license-plate reader (ALPR) Automatic vehicle identification (AVI) Danish: Automatisk nummerpladegenkendelse, lit. 'Automatic number plate recognition' (ANPG) Car-plate recognition (CPR) License-plate recognition (LPR) French: Lecture automatique de plaques d'immatriculation, lit. 'Automatic reading of registration plates' (LAPI) Mobile license-plate reader (MLPR) Vehicle license-plate recognition (VLPR) Vehicle recognition identification (VRI) == Development == ANPR was invented in 1976 at the Police Scientific Development Branch in Britain. Prototype systems were working by 1979, and contracts were awarded to produce industrial systems, first at EMI Electronics, and then at Computer Recognition Systems (CRS, now part of Jenoptik) in Wokingham, UK. Early trial systems were deployed on the A1 road and at the Dartford Tunnel. The first arrest through detection of a stolen car was made in 1981. However, ANPR did not become widely used until new developments in cheaper and easier to use software were pioneered during the 1990s. The collection of ANPR data for future use (i.e., in solving then-unidentified crimes) was documented in the early 2000s. The first documented case of ANPR being used to help solve a murder occurred in November 2005, in Bradford, UK, where ANPR played a vital role in locating and subsequently convicting the killers of Sharon Beshenivsky. == Components == The software aspect of the system runs on standard home computer hardware and can be linked to other applications or databases. It first uses a series of image manipulation techniques to detect, normalize and enhance the image of the number plate, and then optical character recognition (OCR) to extract the alphanumerics of the license plate. ANPR systems are generally deployed in one of two basic approaches: one allows for the entire process to be performed at the lane location in real-time, and the other transmits all the images from many lanes to a remote computer location and performs the OCR process there at some later point in time. When done at the lane site, the information captured of the plate alphanumeric, date-time, lane identification, and any other information required is completed in approximately 250 milliseconds. This information can easily be transmitted to a remote computer for further processing if necessary, or stored at the lane for later retrieval. In the other arrangement, there are typically large numbers of PCs used in a server farm to handle high workloads, such as those found in the London congestion charge project. Often in such systems, there is a requirement to forward images to the remote server, and this can require larger bandwidth transmission media. === Technology === ANPR uses optical character recognition (OCR) on images taken by cameras. When Dutch vehicle registration plates switched to a different style in 2002, one of the changes made was to the font, introducing small gaps in some letters (such as P and R) to make them more distinct and therefore more legible to such systems. Some license plate arrangements use variations in font sizes and positioning—ANPR systems must be able to cope with such differences to be truly effective. More complicated systems can cope with international variants, though many programs are individually tailored to each country. The cameras used can be existing road-rule enforcement or closed-circuit television cameras, as well as mobile units, which are usually attached to vehicles. Some systems use infrared cameras to take a clearer image of the plates. ==== In mobile systems ==== During the 1990s, significant advances in technology took automatic number-plate recognition (ANPR) systems from limited expensive, hard to set up, fixed based applications to simple "point and shoot" mobile ones. This was made possible by the creation of software that ran on cheaper PC based, non-specialist hardware that also no longer needed to be given the pre-defined angles, direction, size and speed in which the plates would be passing the camera's field of view. Further scaled-down components at lower price points led to a record number of deployments by law enforcement agencies globally. Smaller cameras with the ability to read license plates at higher speeds, along with smaller, more durable processors that fit in the trunks of police vehicles, allowed law enforcement officers to patrol daily with the benefit of license plate reading in real time, when they can interdict immediately. Despite their effectiveness, there are noteworthy challenges related with mobile ANPRs. One of the biggest is that the processor and the cameras must work fast enough to accommodate relative speeds of more than 160 km/h (100 mph), a likely scenario in the case of oncoming traffic. This equipment must also be very efficient since the power source is the vehicle electrical system, and equipment must have minimal space requirements. Relative speed is only one issue that affects the camera's ability to read a license plate. Algorithms must be able to compensate for all the variables that can affect the ANPR's ability to produce an accurate read, such as time of day, weather and angles between the cameras and the license plates. A system's illumination wavelengths can also have a direct impact on the resolution and accuracy of a read in these conditions. Installing ANPR cameras on law enforcement vehicles requires careful consideration of the juxtaposition of the cameras to the license plates they are to read. Using the right number of cameras and positioning them accurately for optimal results can prove challenging, given the various missions and environments at hand. Highway patrol requires forward-looking cameras that span multiple lanes and are able to read license plates at high speeds. City patrol needs shorter range, lower focal length cameras for capturing plates on parked cars. Parking lots with perpendicularly parked cars often require a specialized camera with a very short focal length. Most technically advanced systems are flexible and can be configured with a number of cameras ranging from one to four which can easily be repositioned as needed. States with rear-only license plates have an additional challenge since a forward-looking camera is ineffective with oncoming traffic. In this case one camera may be turned backwards. === Algorithms === There are seven primary algorithms that the software requires for identifying a license plate: Plate localization – responsible for finding and isolating the plate on the picture Plate orientation and sizing – compensates for the skew of the plate and adjusts the dimensions to the required size Normalization – adjusts the brightness and contrast of the image Character segmentation – finds the individual characters on the plates Optical character recognition Syntactical/Geometrical analysis – check characters and positions against country-specific rules The averaging of the recognised value over multiple fields/images to produce a more reliable or confident result, especially given that any single image may contain a reflected light flare, be partially obscured, or possess other obfuscating effects. The complexity of each of these subsections of the program determines the accuracy of the system. During the third phase (normalization), some systems use edge detection techniques to increase the picture difference between the letters and the plate backing. A median filter may also be used to reduce the visual noise on the image. Contemporary ANPR systems use multiple data sources and analytical techniques that go beyond simple number

    Read more →
  • Snapshot isolation

    Snapshot isolation

    In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation has been adopted by several major database management systems, such as InterBase, Firebird, Oracle, MySQL, PostgreSQL, SQL Anywhere, MongoDB and Microsoft SQL Server (2005 and later). The main reason for its adoption is that it allows better performance than serializability, yet still avoids most of the concurrency anomalies that serializability avoids (but not all). In practice snapshot isolation is implemented within multiversion concurrency control (MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has been used to criticize the ANSI SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI). In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle. == Definition == A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write–write conflict will cause the transaction to abort. In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies. As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2. If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = −$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 − $200) is now −$200), or T2 happens first and similarly prevents T1 from committing. If the database is under snapshot isolation(MVCC), however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither update conflicts, both commit successfully, leaving V1 = V2 = −$100, and V1 + V2 = −$200. Some systems built using multiversion concurrency control (MVCC) may support (only) snapshot isolation to allow transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. This is convenient because MVCC maintains a series of recent history consistent states. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed. However, MVCC systems (such as MarkLogic) will use locks to serialize writes together with MVCC to obtain some of the performance gains and still support the stronger "serializability" level of isolation. == Workarounds == Potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the serializability property. Materialize the conflict Add a special conflict table, which both transactions update in order to create a direct write–write conflict. Promotion Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write–write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE). In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write–write conflict that would prevent the two from succeeding concurrently. However, this approach violates the normal form. Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write–write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible. In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance. == Terminology == Snapshot isolation is called "serializable" mode in Oracle and PostgreSQL versions prior to 9.1, which may cause confusion with the "real serializability" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic. == History == Snapshot isolation arose from work on multiversion concurrency control databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. InterBase, later owned by Borland, was acknowledged to provide SI rather than full serializability in version 4, and likely permitted write-skew anomalies since its first release in 1985. Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions. In 2008, Cahill et al. showed that write-skew anomalies could be prevented by detecting and aborting "dangerous" triplets of concurrent transactions. This implementation of serializability is well-suited to multiversion concurrency control databases, and has been adopted in PostgreSQL 9.1, where it is known as Serializable Snapshot Isolation (SSI). When used consistently, this eliminates the need for the above workarounds. The downside over snapshot isolation is an increase in aborted transactions. This can perform better or worse than snapshot isolation with the above workarounds, depending on workload.

    Read more →
  • Machine translation in China

    Machine translation in China

    Machine translation in China is the history of machine translation systems developed in China. China became the fourth country that began machine translation (MT) research following USA, UK, and the Soviet Union. In 1957, the Language Institute of Chinese Academy of Sciences took the initiative in Russian-Chinese MT research program and set up an MT research group. From then on the research activities were directed and applied for academic purposes in Universities. The turning point of MT systems launching initiatives in market began from 1990s. MT systems went into blossom into the market. Among these systems, there were commercialized MT systems. To be more specific, Transtar was the first commercialized MT system and has been constantly upgraded. What's more, IMC/EC MT system which was developed by Computer Institute of Chinese Academy of Sciences has further made great advancement. Meanwhile, the practical MT system MT-IT-EC specific to communication domain was also striking to notice, for it has greatly improved the efficiency and productivity in the issue of publications. Government funding is a critical component and support in the development of market-oriented machine translation in China. It is evident to see that since Chinese opened up to the outside world and joined the WTO, the vigorous import and export trade generate opportunities for machine translation to transfer technical terms of products into the readable target information. Facing the increasing demand of sophisticated state-of -the -art translation technology, the academic area including research institute and universities are even launching bachelors’ and master's programs regarding machine translation. Thus, strong evidence illustrates the promising field of machine translation in the future market of China.

    Read more →
  • MedSLT

    MedSLT

    MedSLT is a medium-ranged open source spoken language translator developed by the University of Geneva. It is funded by the Swiss National Science Foundation. The system has been designed for the medical domain. It currently covers the doctor-patient diagnosis dialogues for the domains of headache, chest and abdominal pain in English, French, Japanese, Spanish, Catalan and Arabic. The vocabulary used ranges from 350 to 1000 words depending on the domain and language pair. == Motivation for creating MedSLT == With more than 6000 languages worldwide, language barriers become an increasing problem for healthcare. The lack of medical interpreters can lead to disastrous consequences. These range from prolonged hospital stays to wrong diagnosis and medication. A study found that only about half of the 23 million people with limited proficiency in English in the United States had been provided with a medical interpreter. Millions of refugees and immigrants worldwide face similar problems, although not always as severe. The gap between need and availability of language services might be closed with speech translation systems. == Challenges == The biggest challenge is and was to develop an ideal system, though it is not possible to do so at this moment. This system would fit the needs of doctors and the patients alike, and would provide accurate and flexible translation. A realisation of an ideal translation tool is impossible without the use of unrestricted language and a large vocabulary. Medical professionals demand high reliability from translation. This favours rule-based architectures over data-driven. The latter are more suitable for inexperienced users. Rule-based architectures achieve higher accuracy especially if used by experts. Though it is highly desirable to build a bidirectional system supporting a two-way dialogue, which concentrates on patient-centered communication, the patients will have difficult access to the system. Most patients have no experience with such systems. Less reliable results for translation from the patient-to-doctor direction are the outcome. To overcome this the system needs to provide either easy access or an integrated help tool to guide the users through the process. Although controlled rule-based systems achieve good results, they are brittle. To receive good translations the user needs to be familiar with the system and has to know what is covered by the grammar. Covering different sub-domains (headache, chest and abdominal pain) and language pairs presents additional problems. A shared structure and grammar for all subdomains and language pairs minimises development and maintenance costs. The integration of new doctor and patient languages is also a key challenge. Adding new languages should be quick and rather simple, because he system has to be used in many countries to cover multiple language pairs. Direct translation from source to target language proves to be rather difficult. Using interlingua for unidirectional translation instead of a bidirectional approach helps to simplify the translation process. On top of this, the system has to run on different platforms, because mobility is a key issue for many attending physicians. A portable version addresses these issues, but has to deal with the heavy load of the translation process. == The MedSLT system == The system's speech recognition is based on the Nuance 8.5 platform that supports grammar-based language models. All grammars used for recognition, analysis and generation are compiled from a small set of unification grammars. These core grammars are created by the open-source Regulus Grammar Compiler and are automatically specialised using corpus-driven methods. The specialisation considers both the task (recognition, analysis and generation) and the sub-domain (headache, chest and abdominal pain). The specialisation uses the explanation-based learning algorithm to create a treebank from the training corpus. These examples are divided into sets of subtrees by using domain- and grammar-specific rules (also known as "operationality criteria" in machine translation). The subtree rules are combined into a single rule, creating a specialised unification grammar. The grammar is compiled to an executable form, for analysis and generation by a parser or generator, and for recognition of a CFG grammar. A CFG grammar is required for the Nuance engine. Compilation by Nuance-specific criteria turns the grammar into speech recognition packages. The final step uses the training corpus again for statistical tuning of the language model. MedSLT translation processes are based on a rule-based interlingua. The interlingua is treated as an actual language (it is a very simple version of English) and is specified by a Regulus grammar. This grammar does not take account of complex surface syntax phenomena of real languages like movement or agreement. A set of rules is the base for translating the source language semantic representation to interlingua. Another set of rules covers the translation from interlingua to the target language. The semantic representations are converted to surface words using a target language grammar. Defining semantics for a specific domain enables the developers to specify interlingua with a small, tightly constraint semantic grammar. The translations based on interlingua match direct translations almost perfectly, because the development shifts to a decoupled monolingual architecture. A set of combined interlingua corpora, with one corpus per sub-domain, is the core of this architecture. All source language development corpora are translated to interlingua. These are sorted and grouped together with the corresponding source language examples. The interlingua forms are then translated into each target language, and the results are attached together. This organisation improves the translation process. There is no duplicated effort for multilingual regression testing, because each parsing and generation step is performed once. This allows more frequent testing. The representation language used for all forms is Almost Flat Functional semantics. AFF is derived from the Spoken Language Translator, the precursor of MEdSLT. SLT uses Quasi Logical Form, a logical based representation language. QLF is an expressive yet very complex language, causing high development and maintenance costs. A minimal solution was planned for the medical translator. Early versions of the system utilised a language using simple feature-value lists. These lists were supplemented with an optional level of nesting to represent subordinate clauses (i.e. embedded clauses). Determiners were not included, because they are hard to translate and it is difficult to reliably distinguish and recognise them. This way, translation rules became a lot simpler, because only a list of feature-value pairs had to be mapped to another list of pairs. The language turned out to be underconstrained. Adding natural sortal constraints to the grammar solved this problem, but also returned the language to a more expressive formalism. The newly created AFF combines elements of QLF and the feature-value list semantics. This version of flat semantics is enhanced with additional functional markings. This together with a relatively small vocabulary solved the ambiguity problem of the original flat representation language without creating overly complex rules. In addition, the syntactic structures are treated carefully by a compromise of linguistic and engineering traditions. The grammars are in fact retrieved from linguistically motivated resource, using corpus-based methods. They are driven by small sets of examples. This results in simpler and flatter domain-specific grammars. The semantics are less sophisticated and represent a minimal approach in the engineering tradition. Each lexical item contributes a set of feature-value pairs. This leads to simple-to-write translation rules. There are only lists of features-value pairs to map to other feature-value pairs. However, as a result the machine translation channel model becomes underspecified and is weakened, whereas the target language model is strengthened. An intelligent help module is integrated into the system to support users in utilising the full coverage of the grammars. This tool provides the user with examples as close as possible to the users original utterance. The output is based on a library. Each sub-domain and language pair has its own library. The contents are extracted from the combined interlingua corpora. The help module scans the corpus for the tagged source language form mapped with the corresponding target language form. Additionally a second statistical recogniser is used as backup. The results are used to select similar examples from the library. According to the generation preferences, one of the derived strings is picked and the target language string is realised as spoken language. Some statistical corpus based meth

    Read more →
  • Top 10 AI Sales Assistants Compared (2026)

    Top 10 AI Sales Assistants Compared (2026)

    Looking for the best AI sales assistant? An AI sales assistant is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI sales assistant slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Fabric computing

    Fabric computing

    Fabric computing or unified computing involves constructing a computing fabric consisting of interconnected nodes that look like a weave or a fabric when seen collectively from a distance. Usually the phrase refers to a consolidated high-performance computing system consisting of loosely coupled storage, networking and parallel processing functions linked by high bandwidth interconnects (such as 10 Gigabit Ethernet and InfiniBand) but the term has also been used to describe platforms such as the Azure Services Platform and grid computing in general (where the common theme is interconnected nodes that appear as a single logical unit). The fundamental components of fabrics are "nodes" (processor(s), memory, and/or peripherals) and "links" (functional connections between nodes). While the term "fabric" has also been used in association with storage area networks and with switched fabric networking, the introduction of compute resources provides a complete "unified" computing system. Other terms used to describe such fabrics include "unified fabric", "data center fabric" and "unified data center fabric". Ian Foster, director of the Computation Institute at the Argonne National Laboratory and University of Chicago suggested in 2007 that grid computing "fabrics" were "poised to become the underpinning for next-generation enterprise IT architectures and be used by a much greater part of many organizations". == History == While the term has been in use since the mid to late 1990s the growth of cloud computing and Cisco's evangelism of unified data center fabrics followed by unified computing (an evolutionary data center architecture whereby blade servers are integrated or unified with supporting network and storage infrastructure) starting March 2009 has renewed interest in the technology. There have been mixed reactions to Cisco's architecture, particularly from rivals who claim that these proprietary systems will lock out other vendors. Analysts claim that this "ambitious new direction" is "a big risk" as companies such as IBM and HP who have previously partnered with Cisco on data center projects (accounting for $2–3bn of Cisco's annual revenue) are now competing with them. In 2007, Wombat Financial Software launched the "Wombat Data Fabric," the first commercial off-the-shelf software platform providing high performance / low-latency RDMA-based messaging across an Infiniband switch. == Key characteristics == The main advantages of fabrics are that massive concurrent processing combined with a huge, tightly coupled address space makes it possible to solve huge computing problems (such as those presented by delivery of cloud computing services); and that they are both scalable and able to be dynamically reconfigured. Challenges include a non-linearly degrading performance curve, whereby adding resources does not linearly increase performance which is a common problem with parallel computing and maintaining security. == Companies == As of 2015 companies offering unified or fabric computing systems include Avaya, Brocade, Cisco, Dell, Egenera, HPE, IBM, Liquid Computing Corporation, TIBCO, Unisys, and Xsigo Systems.

    Read more →
  • Roni Rosenfeld

    Roni Rosenfeld

    Roni Rosenfeld (Hebrew: רוני רוזנפלד) is an Israeli-American computer scientist and computational epidemiologist, currently serving as the head of the Machine Learning Department at Carnegie Mellon University. He is an international expert in machine learning, infectious disease forecasting, statistical language modeling and artificial intelligence. == Education == Rosenfeld received his B.Sc. in mathematics and physics from Tel Aviv University in 1985. He received his Ph.D. in computer science from Carnegie Mellon University in 1994. While a graduate student, he developed and open-sourced a statistical language-modeling toolkit to allow anyone to create statistical language models from their own corpora and experiment with and extend the toolkit's capabilities. The toolkit has been used by more than 100 NLP laboratories in more than 20 countries. Rosenfeld's Ph.D. thesis, A Maximum Entropy Approach to Adaptive Statistical Language Modeling, was advised by Raj Reddy and Xuedong Huang and won the 2001 Computer, Speech and Language award for "Most Influential Paper in the Last 5 Years." == Career == Shortly after receiving his Ph.D., Rosenfeld joined the faculty of the Carnegie Mellon School of Computer Science as an assistant professor. He was promoted to the rank of associate professor in 1999 and received tenure in 2001. In 2005 he was promoted to professor of language technologies, machine learning computer science and computational biology in the School of Computer Science at Carnegie Mellon University. Rosenfeld also holds adjunct appointments at the University of Pittsburgh School of Medicine, department of computational and systems biology. From 2002 to 2003, Rosenfeld was a visiting professor at the University of Hong Kong. Rosenfeld is the director of Carnegie Mellon's Machine Learning for Social Good (ML4SG) program. He has held educational leadership positions in a variety of programs, including the M.S. in computational finance (1997–1999), graduate computational and statistical learning (2001–2003), M.S. in machine learning (2017) and undergraduate minor in machine learning. Rosenfeld was appointed Head of Carnegie Mellon's Machine Learning Department in 2018. == Research == Rosenfeld's research interests include epidemiological forecasting, information and communication technologies for development (ICT4D), and machine learning for social good. === Epidemiological forecasting === Rosenfeld is a world expert in epidemiological forecasting. He founded and directs the Delphi research group, which has won most of the epidemiological forecasting challenges organized by the U.S. CDC and other U.S. government agencies. In December 2016, the CDC named his group the "Most Accurate Forecaster" for 2015–2016, and in October 2017, the Delphi group's two systems took the top two spots in the 2016-2017 flu forecasting challenge. The CDC recognized Rosenfeld's Delphi group at Carnegie Mellon University as having contributed the most accurate national-, regional-, and state-level influenza-like illness forecasts and national-level hospitalization forecasts to the site. In 2019, the CDC recognized forecasts provided by the Delphi group at Carnegie Mellon as having been the most accurate for five seasons in a row, and named the Delphi group an Influenza Forecasting Center of Excellence, a five-year designation that includes $3 million in research funding. Rosenfeld describes his forecasting research goal as "to make epidemiological forecasting as universally accepted and useful as weather forecasting is today." His recent work in the area has focused on selecting high value epidemiological forecasting targets (e.g. Influenza and Dengue); creating baseline forecasting methods for them; establishing metrics for measuring and tracking forecasting accuracy; estimating the limits of forecastability for each target; and identifying new sources of data that could be helpful to the forecasting goal. == Honors and awards == 2017 Joel and Ruth Spira Teaching Award 2017 CDC Influenza Forecasting Challenge "Most Accurate Forecaster" 1992 Allen Newell Medal for Research Excellence

    Read more →
  • AI Essay Writers: Free vs Paid (2026)

    AI Essay Writers: Free vs Paid (2026)

    Looking for the best AI essay writer? An AI essay writer is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI essay writer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • How to Choose an AI Website Builder

    How to Choose an AI Website Builder

    Shopping for the best AI website builder? An AI website builder is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI website builder slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Software diversity

    Software diversity

    Software diversity is a research field about the comprehension and engineering of diversity in the context of software. == Areas == The different areas of software diversity are discussed in surveys on diversity for fault-tolerance or for security. The main areas are: design diversity, n-version programming, data diversity for fault tolerance randomization software variability == Techniques == === Code transformations === It is possible to amplify software diversity through automated transformation processes that create synthetic diversity. A "multicompiler" is compiler embedding a diversification engine. A multi-variant execution environment (MVEE) is responsible for selecting the variant to execute and compare the output. Fred Cohen was among the very early promoters of such an approach. He proposed a series of rewriting and code reordering transformations that aim at producing massive quantities of different versions of operating systems functions. These ideas have been developed over the years and have led to the construction of integrated obfuscation schemes to protect key functions in large software systems. Another approach to increase software diversity of protection consists in adding randomness in certain core processes, such as memory loading. Randomness implies that all versions of the same program run differently from each other, which in turn creates a diversity of program behaviors. This idea was initially proposed and experimented by Stephanie Forrest and her colleagues. Recent work on automatic software diversity explores different forms of program transformations that slightly vary the behavior of programs. The goal is to evolve one program into a population of diverse programs that all provide similar services to users, but with a different code. This diversity of code enhances the protection of users against one single attack that could crash all programs at the same time. Transformation operators include: code layout randomization: reorder functions in code globals layout randomization: reorder and pad globals stack variable randomization: reorder variables in each stack frame heap layout randomization === Natural software diversity === It is known that some functionalities are available in multiple interchangeable implementations. This natural diversity can be exploited, for example it has been shown valuable to increase security in cloud systems.

    Read more →
  • Hapax legomenon

    Hapax legomenon

    In corpus linguistics, a hapax legomenon ( also or ; pl. hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is also sometimes used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "said once". The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (, , ) refer to double, triple, or quadruple occurrences, but are far less commonly used. Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus. Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on. == Significance == Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew; see § Hebrew) hapax legomena sometimes pose problems in translation. Hapax legomena also pose challenges in natural language processing. Some scholars consider Hapax legomena useful in determining the authorship of written works. P. N. Harrison, in The Problem of the Pastoral Epistles (1921) made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual. Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W. P. Workman found the following numbers of hapax legomena in each Pauline Epistle: At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others. To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right. Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right. Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work: text length: this directly affects the expected number and percentage of hapax legomena; the brevity of the Pastoral Epistles also makes any statistical analysis problematic. text topic: if the author writes on different subjects, of course many subject-specific words will occur only in limited contexts. text audience: if the author is writing to a peer rather than a student, or their spouse rather than their employer, again quite different vocabulary will appear. time: over the course of years, both the language and an author's knowledge and use of language will change. In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship; those who reject Pauline authorship of the Pastorals rely on other arguments. There are also subjective questions over whether two forms amount to "the same word": dog vs. dogs, clue vs. clueless, sign vs. signature; many other gray cases also arise. The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms. A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors often show similar values. In other words, hapax legomena are not a reliable indicator. Authorship studies now usually use a wide range of measures to look for patterns rather than relying upon single measurements. == Computer science == In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapax legomena. == Examples == The following are some examples of hapax legomena in languages or corpora. === Arabic === In the Qurʾān: The proper nouns Iram (Q 89:7, Iram of the Pillars), Bābil (Q 2:102, Babylon), Bakka(t) (Q 3:96, Bakkah), Jibt (Q 4:51), Ramaḍān (Q 2:185, Ramadan), ar-Rūm (Q 30:2, Byzantine Empire), Tasnīm (Q 83:27), Qurayš (Q 106:1, Quraysh), Majūs (Q 22:17, Magian/Zoroastrian), Mārūt (Q 2:102, Harut and Marut), Makka(t) (Q 48:24, Mecca), Nasr (Q 71:23), (Ḏū) an-Nūn (Q 21:87) and Hārūt (Q 2:102, Harut and Marut) occur only once. zanjabīl (زَنْجَبِيل – ginger) is a Qurʾānic hapax (Q 76:17). zamharīr (زَمْهَرِيرًۭ) is a Qurʾānic hapax (Q 76:13), usually glossed as referring to extreme cold. The epitheton ornans aṣ-ṣamad (الصَّمَد – the One besought) is a Qurʾānic hapax (Q 112:2). ṭūd (طُودْ - mountain) is a Qurʾānic hapax (Q 26:63). === Chinese and Japanese === Classical Chinese and Japanese literature contains many Chinese characters that feature only once in the corpus, and their meaning and pronunciation has often been lost. Known in Japanese as kogo (孤語), literally "lonely characters", these can be considered a type of hapax legomenon. For example, the Classic of Poetry (c. 1000 BC) uses the character 篪 exactly once in the verse 「伯氏吹塤, 仲氏吹篪」, and it was only through the discovery of a description by Guo Pu (276–324 AD) that the character could be associated with a specific type of ancient flute. === English === It is fairly common for authors to "coin" new words to convey a particular meaning or for the sake of entertainment, without any suggestion that they are "proper" words. For example, P.G. Wodehouse and Lewis Carroll frequently coined novel words. Indexy, below, appears to be an example of this. Flother, as a synonym for snowflake, is a hapax legomenon of written English found in a manuscript entitled The XI Pains of Hell (c. 1275). Honorificabilitudinitatibus is a hapax legomenon of Shakespeare's works, coming from Erasmus' Adagia Indexy, in Bram Stoker's Dracula, used as an adjective to describe a situational state with no other further use in the language: "If that man had been an ordinary lunatic I would have taken my chance of trusting him; but he seems so mixed up with the Count in an indexy kind of way that I am afraid of doing anything wrong by helping his fads." Manticratic, meaning "of the rule by the Prophet's family or clan", was apparently invented by T. E. Lawrence and appears once in Seven Pillars of Wisdom. Nortelrye, a word for "education", occurs only once in Chaucer's The Reeve's Tale. Sassigassity, perhaps with the meaning of "audacity", occurs only once in Dickens's short story "A Christmas Tree". Slæpwerigne, "sleep-weary", occurs exactly once in the Old English corpus, in the Exeter Book. There is debate over whether it means "weary with sleep" or "weary for sleep". === German === The name of the 9th-century poem Muspilli is a back-formation from "muspille", Old High German hapax legomenon of unclear meaning only found in this text (see Muspilli § Etymology for discussion). === Ancient Greek === According to classical scholar Clyde Pharr, "the Iliad has 1,097 hapax legomena, while the Odyssey has 868". Others have defined the term differently, however, and count as few as 303 in the Iliad and 191 in the Odyssey. panaōrios (παναώριος), ancient Greek for "very untimely", is one of many words that occur only once in the Iliad. The Greek New Testament contains 686 local hapax legomena, which are sometimes called "New Testament hapaxes". 62 of these occur in 1 Peter and 54 occur in 2 Peter

    Read more →
  • AI Image Generators Reviews: What Actually Works in 2026

    AI Image Generators Reviews: What Actually Works in 2026

    Trying to pick the best AI image generator? An AI image generator is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI image generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →