AI Detector Similar To Turnitin

AI Detector Similar To Turnitin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Python (programming language)

Python is a high-level, general-purpose programming language that emphasizes code readability, simplicity, and ease-of-writing with the use of significant indentation, "plain English" naming, an extensive ("batteries-included") standard library, and garbage collection. Python supports multiple programming paradigms but with an emphasis on object-oriented programming and dynamic typing. Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language. Python 3.0, released in 2008, was a major revision and not completely backward-compatible with earlier versions. Beginning with Python 3.5, capabilities and keywords for typing were added to the language, allowing optional static typing. As of 2026, the Python Software Foundation supports Python 3.10, 3.11, 3.12, 3.13, and 3.14, following the project's annual release cycle and five-year support policy. Python 3.15 is currently in the alpha development phase, and the stable release is expected to launch in October 2026. Earlier versions in the 3.x series have reached end-of-life and no longer receive security updates. Python has gained extensive use in the machine learning community. It is widely taught as an introductory programming language. Since 2003, Python has consistently ranked among the top ten most popular programming languages in the TIOBE Programming Community Index, which ranks programming languages based on searches across 24 platforms. == History == Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands. It was designed as a successor to the ABC programming language, which was inspired by SETL, capable of exception handling and interfacing with the Amoeba operating system. Python implementation began in December 1989. Van Rossum first released it in 1991 as Python 0.9.0. Van Rossum assumed sole responsibility for the project, as the lead developer, until 12 July 2018, when he announced his "permanent vacation" from responsibilities as Python's "benevolent dictator for life" (BDFL); this title was bestowed on him by the Python community to reflect his long-term commitment as the project's chief decision-maker. (He has since come out of retirement and is self-titled "BDFL-emeritus".) In January 2019, active Python core developers elected a five-member Steering Council to lead the project. The name Python derives from the British comedy series Monty Python's Flying Circus. (See § Naming.) Python 2.0 was released on 16 October 2000, featuring many new features such as list comprehensions, cycle-detecting garbage collection, reference counting, and Unicode support. Python 2.7's end-of-life was initially set for 2015, and then postponed to 2020 out of concern that a large body of existing code could not easily be forward-ported to Python 3. It no longer receives security patches or updates. While Python 2.7 and older versions are officially unsupported, a different unofficial Python implementation, PyPy, continues to support Python 2, i.e., "2.7.18+" (plus 3.11), with the plus signifying (at least some) "backported security updates". Python 3.0 was released on 3 December 2008, and was a major revision and not completely backward-compatible with earlier versions, with some new semantics and changed syntax. Python 2.7.18, released in 2020, was the last release of Python 2. Several releases in the Python 3.x series have added new syntax to the language, and made a few (considered very minor) backward-incompatible changes. As of May 2026, Python 3.14.5 is the latest stable release. All older 3.x versions had a security update down to Python 3.9.24 then again with 3.9.25, the final version in 3.9 series. Python 3.10 is, since November 2025, the oldest supported branch. Python 3.15 has an alpha released, and Android has an official downloadable executable available for Python 3.14. Releases receive two years of full support followed by three years of security support. == Design philosophy and features == Python is a multi-paradigm programming language. Object-oriented programming and structured programming are fully supported, and many of their features support functional programming and aspect-oriented programming – including metaprogramming and metaobjects. Many other paradigms are supported via extensions, including design by contract and logic programming. Python is often referred to as a 'glue language' because it is purposely designed to be able to integrate components written in other languages. Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage collector for memory management. It uses dynamic name resolution (late binding), which binds method and variable names during program execution. Python's design offers some support for functional programming in the "Lisp tradition". It has filter, map, and reduce functions; list comprehensions, dictionaries, sets, and generator expressions. The standard library has two modules (itertools and functools) that implement functional tools borrowed from Haskell and Standard ML. Python's core philosophy is summarized in the Zen of Python (PEP 20) written by Tim Peters, which includes aphorisms such as these: Explicit is better than implicit. Simple is better than complex. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity, errors should never pass silently, unless explicitly silenced. There should be one-- and preferably only one --obvious way to do it. However, Python has received criticism for violating these principles and adding unnecessary language bloat. Responses to these criticisms note that the Zen of Python is a guideline rather than a rule. The addition of some new features had been controversial: Guido van Rossum resigned as Benevolent Dictator for Life after conflict about adding the assignment expression operator in Python 3.8. Nevertheless, rather than building all functionality into its core, Python was designed to be highly extensible through modules. This compact modularity has made it particularly popular as a means of adding programmable interfaces to existing applications. Van Rossum's vision of a small core language with a large standard library and an easily extensible interpreter stemmed from his frustrations with ABC, which represented the opposite approach. Python claims to strive for a simpler, less-cluttered syntax and grammar, while giving developers a choice in their coding methodology. Python lacks do .. while loops, which Rossum considered harmful. In contrast to Perl's motto "there is more than one way to do it", Python advocates an approach where "there should be one – and preferably only one – obvious way to do it". In practice, however, Python provides many ways to achieve a given goal. There are at least three ways to format a string literal, with no certainty as to which one a programmer should use. Alex Martelli is a Fellow at the Python Software Foundation and Python book author; he wrote that "To describe something as 'clever' is not considered a compliment in the Python culture." Python's developers typically prioritize readability over performance. For example, they reject patches to non-critical parts of the CPython reference implementation that would offer increases in speed that do not justify the cost of clarity and readability. Execution speed can be improved by moving speed-critical functions to extension modules written in languages such as C, or by using a just-in-time compiler like PyPy. Also, it is possible to transpile to other languages. However, this approach either fails to achieve the expected speed-up, since Python is a very dynamic language, or only a restricted subset of Python is compiled (with potential minor semantic changes). Python is meant to be a fun language to use. This goal is reflected in the name – a tribute to the British comedy group Monty Python – and in playful approaches to some tutorials and reference materials. For instance, some code examples use the terms "spam" and "eggs" (in reference to a Monty Python sketch), rather than the typical terms "foo" and "bar". A common neologism in the Python community is pythonic, which has a broad range of meanings related to program style: Pythonic code may use Python idioms well; be natural or show fluency in the language; or conform with Python's minimalist philosophy and emphasis on readability. === Enhancement Proposals === Python Enhancement Proposals are a design document for either providing information to the Python community, or proposal for new feature in Python. PEPs are intented to explain new processes in Python, provide naming conventions or document the processes in the language. PEPs are overseen by Python Steering Council. There are 3 kinds of PEPs, with those are being standards track PEP, Informational PEP and Process PEPs which has their own unique meanings. They were firstly introduced in 2000, in
Read more →
Edward Stabler

Edward Stabler is a Professor of Linguistics at the University of California, Los Angeles. His primary areas of research are (1) Natural Language Processing (NLP), (2) Parsing and formal language theory, and (3) Philosophy of Logic and Language. He was a member of the faculty at UCLA from 1984 to 2016. His work involves the production of software for minimalist grammars (MGs) and related systems. == Early life and education == Stabler received his Ph.D. from the Department of Linguistics and Philosophy at MIT in 1981. == Recent publications == Edward Stabler (2011) Computational perspectives on minimalism. Revised version in C. Boeckx, ed, Oxford Handbook of Linguistic Minimalism, pp. 617–642. Edward Stabler (2010) A defense of this perspective against the Evans&Levinson critique appears here, with revised version in Lingua 120(12): 2680-2685. Edward Stabler (2010) After GB. Revised version in J. van Benthem & A. ter Meulen, eds, Handbook of Logic and Language, pp. 395–414. Edward Stabler (2010) Recursion in grammar and performance. Presented at the 2009 UMass recursion conference. Edward Stabler (2009) Computational models of language universals. Revised version appears in M. H. Christiansen, C. Collins, and S. Edelman, eds., Language Universals, Oxford: Oxford University Press, pages 200-223. Edward Stabler (2008) Tupled pregroup grammars. Revised version appears in P. Casadio and J. Lambek, eds., Computational Algebraic Approaches to Natural Language, Milan: Polimetrica, pages 23–52. Edward Stabler (2006) Sidewards without copying. Proceedings of the 11th Conference on Formal Grammar, edited by P. Monachesi, G. Penn, G. Satta, and S. Wintner. Stanford: CSLI Publications, 2006, pages 133-146.
Read more →
Büchi automaton

In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machine should move to from its current state when it reads the next input character. Some states are accepting states and one state is the start state. The machine accepts an input if and only if it will pass through an accepting state infinitely many times as it reads the input. A non-deterministic Büchi automaton, later referred to just as a Büchi automaton, has a transition function which may have multiple outputs, leading to many possible paths for the same input; it accepts an infinite input if and only if some possible path is accepting. Deterministic and non-deterministic Büchi automata generalize deterministic finite automata and nondeterministic finite automata to infinite inputs. Each are types of ω-automata. Büchi automata recognize the ω-regular languages, the infinite word version of regular languages. They are named after the Swiss mathematician Julius Richard Büchi, who invented them in 1962. Büchi automata are often used in model checking as an automata-theoretic version of a formula in linear temporal logic. == Formal definition == Formally, a deterministic Büchi automaton is a tuple A = ( Q , Σ , δ , q 0 , F ) {\textstyle A=(Q,\Sigma ,\delta ,q_{0},\mathbf {F} )} that consists of the following components: Q {\textstyle Q} is a finite set. The elements of Q {\textstyle Q} are called the states of A {\textstyle A} . Σ {\textstyle \Sigma } is a finite set called the alphabet of A {\textstyle A} . δ : Q × Σ → Q {\textstyle \delta \colon Q\times \Sigma \to Q} is a function, called the transition function of A {\textstyle A} . q 0 {\textstyle q_{0}} is an element of Q {\textstyle Q} , called the initial state of A {\textstyle A} . F ⊆ Q {\textstyle \mathbf {F} \subseteq Q} is the acceptance condition. A run i _ = i 0 i 1 i 2 ⋯ ∈ Σ ω {\displaystyle {\underline {i}}=i_{0}i_{1}i_{2}\cdots \in \Sigma ^{\omega }} is an infinite string of inputs of A {\displaystyle A} . By calling δ {\displaystyle \delta } recursively, we can extend it to a function δ ω : Σ ω → Q ω {\displaystyle \delta ^{\omega }:\Sigma ^{\omega }\to Q^{\omega }} . A state q ∈ Q {\displaystyle q\in Q} is said to occur infinitely often for a run i _ {\displaystyle {\underline {i}}} when the set { n ∈ N ∣ δ ω ( i _ ) n = q } {\displaystyle \{n\in \mathbb {N} \mid \delta ^{\omega }({\underline {i}})_{n}=q\}} is infinite. Let I n f ( i _ ) {\displaystyle \mathrm {Inf} ({\underline {i}})} be the set of states occurring infinitely often for i _ {\displaystyle {\underline {i}}} . The language of A {\displaystyle A} is then the set of runs of A {\displaystyle A} in which at least one of the infinitely-often occurring states is in F {\textstyle \mathbf {F} } ; in symbols: L ( A ) = { i _ ∈ Σ ω ∣ I n f ( i _ ) ∩ F ≠ ∅ } . {\displaystyle L(A)=\{{\underline {i}}\in \Sigma ^{\omega }\mid \mathrm {Inf} ({\underline {i}})\cap \mathbf {F} \neq \varnothing \}.} In a (non-deterministic) Büchi automaton, the transition function δ {\textstyle \delta } is replaced with a transition relation Δ {\textstyle \Delta } that returns a set of states, and the single initial state q 0 {\textstyle q_{0}} is replaced by a set I {\textstyle I} of initial states. Generally, the term Büchi automaton without qualifier refers to non-deterministic Büchi automata. For more comprehensive formalism see also ω-automaton. == Closure properties == The set of Büchi automata is closed under the following operations. Let A = ( Q A , Σ , Δ A , I A , F A ) {\displaystyle A=(Q_{A},\Sigma ,\Delta _{A},I_{A},{F}_{A})} and B = ( Q B , Σ , Δ B , I B , F B ) {\displaystyle B=(Q_{B},\Sigma ,\Delta _{B},I_{B},{F}_{B})} be Büchi automata and C = ( Q C , Σ , Δ C , I C , F C ) {\displaystyle C=(Q_{C},\Sigma ,\Delta _{C},I_{C},{F}_{C})} be a finite automaton. Union: There is a Büchi automaton that recognizes the language L ( A ) ∪ L ( B ) . {\displaystyle L(A)\cup L(B).} Proof: If we assume, w.l.o.g., Q A ∩ Q B {\displaystyle Q_{A}\cap Q_{B}} is empty then L ( A ) ∪ L ( B ) {\displaystyle L(A)\cup L(B)} is recognized by the Büchi automaton ( Q A ∪ Q B , Σ ∪ Σ , Δ A ∪ Δ B , I A ∪ I B , F A ∪ F B ) . {\displaystyle (Q_{A}\cup Q_{B},\Sigma \cup \Sigma ,\Delta _{A}\cup \Delta _{B},I_{A}\cup I_{B},{F}_{A}\cup {F}_{B}).} Intersection: There is a Büchi automaton that recognizes the language L ( A ) ∩ L ( B ) . {\displaystyle L(A)\cap L(B).} Proof: The Büchi automaton A ′ = ( Q ′ , Σ , Δ ′ , I ′ , F ′ ) {\displaystyle A'=(Q',\Sigma ,\Delta ',I',F')} recognizes L ( A ) ∩ L ( B ) , {\displaystyle L(A)\cap L(B),} where Q ′ = Q A × Q B × { 1 , 2 } {\displaystyle Q'=Q_{A}\times Q_{B}\times \{1,2\}} Δ ′ = Δ 1 ∪ Δ 2 {\displaystyle \Delta '=\Delta _{1}\cup \Delta _{2}} Δ 1 = { ( ( q A , q B , 1 ) , a , ( q A ′ , q B ′ , i ) ) | ( q A , a , q A ′ ) ∈ Δ A and ( q B , a , q B ′ ) ∈ Δ B and if q A ∈ F A then i = 2 else i = 1 } {\displaystyle \Delta _{1}=\{((q_{A},q_{B},1),a,(q'_{A},q'_{B},i))|(q_{A},a,q'_{A})\in \Delta _{A}{\text{ and }}(q_{B},a,q'_{B})\in \Delta _{B}{\text{ and if }}q_{A}\in F_{A}{\text{ then }}i=2{\text{ else }}i=1\}} Δ 2 = { ( ( q A , q B , 2 ) , a , ( q A ′ , q B ′ , i ) ) | ( q A , a , q A ′ ) ∈ Δ A and ( q B , a , q B ′ ) ∈ Δ B and if q B ∈ F B then i = 1 else i = 2 } {\displaystyle \Delta _{2}=\{((q_{A},q_{B},2),a,(q'_{A},q'_{B},i))|(q_{A},a,q'_{A})\in \Delta _{A}{\text{ and }}(q_{B},a,q'_{B})\in \Delta _{B}{\text{ and if }}q_{B}\in F_{B}{\text{ then }}i=1{\text{ else }}i=2\}} I ′ = I A × I B × { 1 } {\displaystyle I'=I_{A}\times I_{B}\times \{1\}} F ′ = { ( q A , q B , 2 ) | q B ∈ F B } {\displaystyle F'=\{(q_{A},q_{B},2)|q_{B}\in F_{B}\}} By construction, r ′ = ( q A 0 , q B 0 , i 0 ) , ( q A 1 , q B 1 , i 1 ) , … {\displaystyle r'=(q_{A}^{0},q_{B}^{0},i^{0}),(q_{A}^{1},q_{B}^{1},i^{1}),\dots } is a run of automaton A' on input word w {\textstyle w} if r A = q A 0 , q A 1 , … {\displaystyle r_{A}=q_{A}^{0},q_{A}^{1},\dots } is run of A {\textstyle A} on w {\textstyle w} and r B = q B 0 , q B 1 , … {\displaystyle r_{B}=q_{B}^{0},q_{B}^{1},\dots } is run of B {\textstyle B} on w {\textstyle w} . r A {\textstyle r_{A}} is accepting and r B {\textstyle r_{B}} is accepting if r ′ {\textstyle r'} is concatenation of an infinite series of finite segments of 1-states (states with third component 1) and 2-states (states with third component 2) alternatively. There is such a series of segments of r ′ {\textstyle r'} if r ′ {\textstyle r'} is accepted by A ′ {\textstyle A'} . Concatenation: There is a Büchi automaton that recognizes the language L ( C ) ⋅ L ( A ) . {\displaystyle L(C)\cdot L(A).} Proof: If we assume, w.l.o.g., Q C ∩ Q A {\displaystyle Q_{C}\cap Q_{A}} is empty then the Büchi automaton A ′ = ( Q C ∪ Q A , Σ , Δ ′ , I ′ , F A ) {\displaystyle A'=(Q_{C}\cup Q_{A},\Sigma ,\Delta ',I',F_{A})} recognizes L ( C ) ⋅ L ( A ) {\displaystyle L(C)\cdot L(A)} , where Δ ′ = Δ A ∪ Δ C ∪ { ( q , a , q ′ ) | q ′ ∈ I A and ∃ f ∈ F C . ( q , a , f ) ∈ Δ C } {\displaystyle \Delta '=\Delta _{A}\cup \Delta _{C}\cup \{(q,a,q')|q'\in I_{A}{\text{ and }}\exists f\in F_{C}.(q,a,f)\in \Delta _{C}\}} if I C ∩ F C is empty then I ′ = I C otherwise I ′ = I C ∪ I A {\displaystyle {\text{ if }}I_{C}\cap F_{C}{\text{ is empty then }}I'=I_{C}{\text{ otherwise }}I'=I_{C}\cup I_{A}} ω-closure: If L ( C ) {\displaystyle L(C)} does not contain the empty word then there is a Büchi automaton that recognizes the language L ( C ) ω . {\displaystyle L(C)^{\omega }.} Proof: The Büchi automaton that recognizes L ( C ) ω {\displaystyle L(C)^{\omega }} is constructed in two stages. First, we construct a finite automaton A ′ {\textstyle A'} such that A ′ {\textstyle A'} also recognizes L ( C ) {\displaystyle L(C)} but there are no incoming transitions to initial states of A ′ {\textstyle A'} . So, A ′ = ( Q C ∪ { q new } , Σ , Δ ′ , { q new } , F C ) , {\displaystyle A'=(Q_{C}\cup \{q_{\text{new}}\},\Sigma ,\Delta ',\{q_{\text{new}}\},F_{C}),} where Δ ′ = Δ C ∪ { ( q new , a , q ′ ) | ∃ q ∈ I C . ( q , a , q ′ ) ∈ Δ C } . {\displaystyle \Delta '=\Delta _{C}\cup \{(q_{\text{new}},a,q')|\exists q\in I_{C}.(q,a,q')\in \Delta _{C}\}.} Note that L ( C ) = L ( A ′ ) {\displaystyle L(C)=L(A')} because L ( C ) {\displaystyle L(C)} does not contain the empty string. Second, we will construct the Büchi automaton A ″ {\textstyle A''} that recognize L ( C ) ω {\displaystyle L(C)^{\omega }} by adding a loop back to the initial state of A ′ {\textstyle A'} . So, A ″ = ( Q C ∪ { q new } , Σ , Δ ″ , { q new } , { q new } ) {\displaystyle A''=(Q_{C}\cup \{q_{\text{new}}\},\Sigma ,\Delta '',\{q_{\text{new}}\},\{q_{\text{new}}\})} , where Δ ″ = Δ ′ ∪ { ( q , a , q new ) | ∃ q ′ ∈ F C . ( q , a , q ′ ) ∈ Δ ′ } . {\displaystyle \Delta ''=\Delta '\cup \{(q,a,q_{\text{new}})|\exists q'\in F_{C}.(q,a,q')\in \Delta '\}.} Complementation:
Read more →
Hapax legomenon

In corpus linguistics, a hapax legomenon ( also or ; pl. hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is also sometimes used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "said once". The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (, , ) refer to double, triple, or quadruple occurrences, but are far less commonly used. Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus. Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on. == Significance == Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew; see § Hebrew) hapax legomena sometimes pose problems in translation. Hapax legomena also pose challenges in natural language processing. Some scholars consider Hapax legomena useful in determining the authorship of written works. P. N. Harrison, in The Problem of the Pastoral Epistles (1921) made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual. Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W. P. Workman found the following numbers of hapax legomena in each Pauline Epistle: At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others. To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right. Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right. Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work: text length: this directly affects the expected number and percentage of hapax legomena; the brevity of the Pastoral Epistles also makes any statistical analysis problematic. text topic: if the author writes on different subjects, of course many subject-specific words will occur only in limited contexts. text audience: if the author is writing to a peer rather than a student, or their spouse rather than their employer, again quite different vocabulary will appear. time: over the course of years, both the language and an author's knowledge and use of language will change. In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship; those who reject Pauline authorship of the Pastorals rely on other arguments. There are also subjective questions over whether two forms amount to "the same word": dog vs. dogs, clue vs. clueless, sign vs. signature; many other gray cases also arise. The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms. A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors often show similar values. In other words, hapax legomena are not a reliable indicator. Authorship studies now usually use a wide range of measures to look for patterns rather than relying upon single measurements. == Computer science == In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapax legomena. == Examples == The following are some examples of hapax legomena in languages or corpora. === Arabic === In the Qurʾān: The proper nouns Iram (Q 89:7, Iram of the Pillars), Bābil (Q 2:102, Babylon), Bakka(t) (Q 3:96, Bakkah), Jibt (Q 4:51), Ramaḍān (Q 2:185, Ramadan), ar-Rūm (Q 30:2, Byzantine Empire), Tasnīm (Q 83:27), Qurayš (Q 106:1, Quraysh), Majūs (Q 22:17, Magian/Zoroastrian), Mārūt (Q 2:102, Harut and Marut), Makka(t) (Q 48:24, Mecca), Nasr (Q 71:23), (Ḏū) an-Nūn (Q 21:87) and Hārūt (Q 2:102, Harut and Marut) occur only once. zanjabīl (زَنْجَبِيل – ginger) is a Qurʾānic hapax (Q 76:17). zamharīr (زَمْهَرِيرًۭ) is a Qurʾānic hapax (Q 76:13), usually glossed as referring to extreme cold. The epitheton ornans aṣ-ṣamad (الصَّمَد – the One besought) is a Qurʾānic hapax (Q 112:2). ṭūd (طُودْ - mountain) is a Qurʾānic hapax (Q 26:63). === Chinese and Japanese === Classical Chinese and Japanese literature contains many Chinese characters that feature only once in the corpus, and their meaning and pronunciation has often been lost. Known in Japanese as kogo (孤語), literally "lonely characters", these can be considered a type of hapax legomenon. For example, the Classic of Poetry (c. 1000 BC) uses the character 篪 exactly once in the verse 「伯氏吹塤, 仲氏吹篪」, and it was only through the discovery of a description by Guo Pu (276–324 AD) that the character could be associated with a specific type of ancient flute. === English === It is fairly common for authors to "coin" new words to convey a particular meaning or for the sake of entertainment, without any suggestion that they are "proper" words. For example, P.G. Wodehouse and Lewis Carroll frequently coined novel words. Indexy, below, appears to be an example of this. Flother, as a synonym for snowflake, is a hapax legomenon of written English found in a manuscript entitled The XI Pains of Hell (c. 1275). Honorificabilitudinitatibus is a hapax legomenon of Shakespeare's works, coming from Erasmus' Adagia Indexy, in Bram Stoker's Dracula, used as an adjective to describe a situational state with no other further use in the language: "If that man had been an ordinary lunatic I would have taken my chance of trusting him; but he seems so mixed up with the Count in an indexy kind of way that I am afraid of doing anything wrong by helping his fads." Manticratic, meaning "of the rule by the Prophet's family or clan", was apparently invented by T. E. Lawrence and appears once in Seven Pillars of Wisdom. Nortelrye, a word for "education", occurs only once in Chaucer's The Reeve's Tale. Sassigassity, perhaps with the meaning of "audacity", occurs only once in Dickens's short story "A Christmas Tree". Slæpwerigne, "sleep-weary", occurs exactly once in the Old English corpus, in the Exeter Book. There is debate over whether it means "weary with sleep" or "weary for sleep". === German === The name of the 9th-century poem Muspilli is a back-formation from "muspille", Old High German hapax legomenon of unclear meaning only found in this text (see Muspilli § Etymology for discussion). === Ancient Greek === According to classical scholar Clyde Pharr, "the Iliad has 1,097 hapax legomena, while the Odyssey has 868". Others have defined the term differently, however, and count as few as 303 in the Iliad and 191 in the Odyssey. panaōrios (παναώριος), ancient Greek for "very untimely", is one of many words that occur only once in the Iliad. The Greek New Testament contains 686 local hapax legomena, which are sometimes called "New Testament hapaxes". 62 of these occur in 1 Peter and 54 occur in 2 Peter
Read more →
Deblurring

Deblurring is the process of removing blurring artifacts from images. Deblurring recovers a sharp image S from a blurred image B, where S is convolved with K (the blur kernel) to generate B. Mathematically, this can be represented as B = S ∗ K {\displaystyle B=SK} (where represents convolution). While this process is sometimes known as unblurring, deblurring is the correct technical word. The blur K is typically modeled as point spread function and is convolved with a hypothetical sharp image S to get B, where both the S (which is to be recovered) and the point spread function K are unknown. This is an example of an inverse problem. In almost all cases, there is insufficient information in the blurred image to uniquely determine a plausible original image, making it an ill-posed problem. In addition the blurred image contains additional noise which complicates the task of determining the original image. This is generally solved by the use of a regularization term to attempt to eliminate implausible solutions. This problem is analogous to echo removal in the signal processing domain. Nevertheless, when coherent beam is used for imaging, the point spread function can be modeled mathematically. By proper deconvolution of the point spread function K and the blurred image B, the blurred image B can be deblurred (unblur) and the sharp image S can be recovered.
Read more →
Mealy machine

In the theory of computation, a Mealy machine is a finite-state machine whose output values are determined both by its current state and the current inputs. This is in contrast to a Moore machine, whose output values are determined solely by its current state. A Mealy machine is a deterministic finite-state transducer: for each state and input, at most one transition is possible. == History == The Mealy machine is named after George H. Mealy, who presented the concept in a 1955 paper, "A Method for Synthesizing Sequential Circuits". == Formal definition == A Mealy machine is a 6-tuple ( S , S 0 , Σ , Λ , T , G ) {\displaystyle (S,S_{0},\Sigma ,\Lambda ,T,G)} consisting of the following: a finite set of states S {\displaystyle S} a start state (also called initial state) S 0 {\displaystyle S_{0}} which is an element of S {\displaystyle S} a finite set called the input alphabet Σ {\displaystyle \Sigma } a finite set called the output alphabet Λ {\displaystyle \Lambda } a transition function T : S × Σ → S {\displaystyle T:S\times \Sigma \rightarrow S} mapping pairs of a state and an input symbol to the corresponding next state. an output function G : S × Σ → Λ {\displaystyle G:S\times \Sigma \rightarrow \Lambda } mapping pairs of a state and an input symbol to the corresponding output symbol. In some formulations, the transition and output functions are coalesced into a single function T : S × Σ → S × Λ {\displaystyle T:S\times \Sigma \rightarrow S\times \Lambda } . "Evolution across time" is realized in this abstraction by having the state machine consult the time-changing input symbol at discrete "timer ticks" t 0 , t 1 , t 2 , . . . {\displaystyle t_{0},t_{1},t_{2},...} and react according to its internal configuration at those idealized instants, or else having the state machine wait for a next input symbol (as on a FIFO) and react whenever it arrives. == Comparison of Mealy machines and Moore machines == Mealy machines tend to have fewer states: Different outputs on arcs (n2) rather than states (n). When implemented as electronic circuits (rather than as mathematical abstractions or code): Moore machines are safer to use than Mealy machines: Outputs change at the clock edge (always one cycle later). In Mealy machines, input change can cause output change as soon as logic is done — a big problem when two machines are interconnected – asynchronous feedback may occur if one isn't careful. Mealy machines react faster to inputs: React in the same cycle—they don't need to wait for the clock. In Moore machines, more logic may be necessary to decode state into outputs—more gate delays after clock edge. == Diagram == The state diagram for a Mealy machine associates an output value with each transition edge, in contrast to the state diagram for a Moore machine, which associates an output value with each state. When the input and output alphabet are both Σ, one can also associate to a Mealy automata a Helix directed graph (S × Σ, (x, i) → (T(x, i), G(x, i))). This graph has as vertices the couples of state and letters, each node is of out-degree one, and the successor of (x, i) is the next state of the automata and the letter that the automata output when it is instate x and it reads letter i. This graph is a union of disjoint cycles if the automaton is bireversible. == Examples == === Simple === A simple Mealy machine has one input and one output. Each transition edge is labeled with the value of the input (shown in red) and the value of the output (shown in blue). The machine starts in state Si. (In this example, the output is the exclusive-or of the two most-recent input values; thus, the machine implements an edge detector, outputting a 1 every time the input flips and a 0 otherwise.) === Complex === More complex Mealy machines can have multiple inputs as well as multiple outputs. == Applications == Mealy machines provide a rudimentary mathematical model for cipher machines. Considering the input and output alphabet the Latin alphabet, for example, then a Mealy machine can be designed that given a string of letters (a sequence of inputs) can process it into a ciphered string (a sequence of outputs). However, although a Mealy model could be used to describe the Enigma, the state diagram would be too complex to provide feasible means of designing complex ciphering machines. Moore/Mealy machines are DFAs that have also output at any tick of the clock. Modern CPUs, computers, cell phones, digital clocks and basic electronic devices/machines have some kind of finite state machine to control it. Simple software systems, particularly ones that can be represented using regular expressions, can be modeled as finite state machines. There are many such simple systems, such as vending machines or basic electronics. By finding the intersection of two finite state machines, one can design in a very simple manner concurrent systems that exchange messages for instance. For example, a traffic light is a system that consists of multiple subsystems, such as the different traffic lights, that work concurrently.
Read more →
Hanna Hajishirzi

Hannaneh Hajishirzi is an Iranian-American computer scientist specializing in natural language processing. She is Torode Family Professor in Computer Science & Engineering in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, head of the H2Lab in the Allen School, and a senior director of natural language processing in the Allen Institute for AI. == Education and career == After a bachelor's degree from the Sharif University of Technology, Hajishirzi completed her Ph.D. in computer science in 2011, at the University of Illinois Urbana-Champaign. Her dissertation, Action-Centered Reasoning for Probabilistic Dynamic Systems, was supervised by Eyal Amir. After postdoctoral research at Disney Research in Pittsburgh, Hajishirzi joined the University of Washington in 2012, as a research scientist in electrical engineering. In 2015 she became a research assistant professor in electrical engineering. She obtained a regular-rank assistant professorship in 2018, at the same time becoming an AI Fellow in the Allen Institute for AI, where she became a senior director of research in 2021. She was promoted to associate professor in 2022 and to full professor in 2025. == Recognition == Hajishirzi was named as a Fellow of the Association for Computational Linguistics in 2025, "for significant contributions to question answering, scientific applications, multimodal artificial intelligence, and fully open language models". == Personal life == Hajishirzi is married to Ali Farhadi, the CEO of the Allen Institute for AI.
Read more →
Mehryar Mohri

Mehryar Mohri is a professor and theoretical computer scientist at the Courant Institute of Mathematical Sciences. He is also heading the Machine Learning Theory (ML Theory) team at Google Research. == Career == Prior to joining the Courant Institute, Mohri was a research department head and later technology leader at AT&T Bell Labs, where he was a member of the technical staff for about ten years. Mohri has also taught as an assistant professor at the University of Paris 7 (1992-1993) and Ecole Polytechnique (1992-1994). == Research == Mohri's main area of research is machine learning, in particular learning theory. He is also an expert in automata theory and algorithms. He is the author of several core algorithms that have served as the foundation for the design of many deployed speech recognition and natural language processing systems. == Publications == Mohri is the author of the reference book Foundations of Machine Learning used as a textbook in many graduate-level machine learning courses. Mohri is also a member of the Lothaire group of mathematicians with the pseudonym M. Lothaire and contributed to the book on Applied Combinatorics on Words. He is the author of more than 250 conference and journal publications. == Organizational affiliations == Mohri is currently the President of the Association for Algorithmic Learning Theory (AALT) and the Steering Committee Chair for the ALT conference. He is also Editorial Board member of Machine Learning and TheoretiCS, Action Editor of the Journal of Machine Learning Research (JMLR) and a member of the advisory board for the Journal of Automata, Languages and Combinatorics.
Read more →
Geometric primitive

In vector computer graphics, CAD systems, and geographic information systems, a geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that draw the corresponding objects are called "geometric primitives" as well. The most "primitive" primitives are point and straight line segments, which were all that early vector graphics systems had. In constructive solid geometry, primitives are simple geometric shapes such as a cube, cylinder, sphere, cone, pyramid, torus. Modern 2D computer graphics systems may operate with primitives which are curves (segments of straight lines, circles and more complicated curves), as well as shapes (boxes, arbitrary polygons, circles). A common set of two-dimensional primitives includes lines, points, and polygons, although some people prefer to consider triangles primitives, because every polygon can be constructed from triangles (polygon triangulation). All other graphic elements are built up from these primitives. In three dimensions, triangles or polygons positioned in three-dimensional space can be used as primitives to model more complex 3D forms. In some cases, curves (such as Bézier curves, circles, etc.) may be considered primitives; in other cases, curves are complex forms created from many straight, primitive shapes. == Common primitives == The set of geometric primitives is based on the dimension of the region being represented: Point (0-dimensional), a single location with no height, width, or depth. Line or curve (1-dimensional), having length but no width, although a linear feature may curve through a higher-dimensional space. Planar surface or curved surface (2-dimensional), having length and width. Volumetric region or solid (3-dimensional), having length, width, and depth. In GIS, the terrain surface is often spoken of colloquially as "2 1/2 dimensional," because only the upper surface needs to be represented. Thus, elevation can be conceptualized as a scalar field property or function of two-dimensional space, affording it a number of data modeling efficiencies over true 3-dimensional objects. A shape of any of these dimensions greater than zero consists of an infinite number of distinct points. Because digital systems are finite, only a sample set of the points in a shape can be stored. Thus, vector data structures typically represent geometric primitives using a strategic sample, organized in structures that facilitate the software interpolating the remainder of the shape at the time of analysis or display, using the algorithms of Computational geometry. A Point is a single coordinate in a Cartesian coordinate system. Some data models allow for Multipoint features consisting of several disconnected points. A Polygonal chain or Polyline is an ordered list of points (termed vertices in this context). The software is expected to interpolate the intervening shape of the line between adjacent points in the list as a parametric curve, most commonly a straight line, but other types of curves are frequently available, including circular arcs, cubic splines, and Bézier curves. Some of these curves require additional points to be defined that are not on the line itself, but are used for parametric control. A Polygon is a polyline that closes at its endpoints, representing the boundary of a two-dimensional region. The software is expected to use this boundary to partition 2-dimensional space into an interior and exterior. Some data models allow for a single feature to consist of multiple polylines, which could collectively connect to form a single closed boundary, could represent a set of disjoint regions (e.g., the state of Hawaii), or could represent a region with holes (e.g., a lake with an island). A Parametric shape is a standardized two-dimensional or three-dimensional shape defined by a minimal set of parameters, such as an ellipse defined by two points at its foci, or three points at its center, vertex, and co-vertex. A Polyhedron or Polygon mesh is a set of polygon faces in three-dimensional space that are connected at their edges to completely enclose a volumetric region. In some applications, closure may not be required or may be implied, such as modeling terrain. The software is expected to use this surface to partition 3-dimensional space into an interior and exterior. A triangle mesh is a subtype of polyhedron in which all faces must be triangles, the only polygon that will always be planar, including the Triangulated irregular network (TIN) commonly used in GIS. A parametric mesh represents a three-dimensional surface by a connected set of parametric functions, similar to a spline or Bézier curve in two dimensions. The most common structure is the Non-uniform rational B-spline (NURBS), supported by most CAD and animation software. == Application in GIS == A wide variety of vector data structures and formats have been developed during the history of Geographic information systems, but they share a fundamental basis of storing a core set of geometric primitives to represent the location and extent of geographic phenomena. Locations of points are almost always measured within a standard Earth-based coordinate system, whether the spherical Geographic coordinate system (latitude/longitude), or a planar coordinate system, such as the Universal Transverse Mercator. They also share the need to store a set of attributes of each geographic feature alongside its shape; traditionally, this has been accomplished using the data models, data formats, and even software of relational databases. Early vector formats, such as POLYVRT, the ARC/INFO Coverage, and the Esri shapefile support a basic set of geometric primitives: points, polylines, and polygons, only in two dimensional space and the latter two with only straight line interpolation. TIN data structures for representing terrain surfaces as triangle meshes were also added. Since the mid 1990s, new formats have been developed that extend the range of available primitives, generally standardized by the Open Geospatial Consortium's Simple Features specification. Common geometric primitive extensions include: three-dimensional coordinates for points, lines, and polygons; a fourth "dimension" to represent a measured attribute or time; curved segments in lines and polygons; text annotation as a form of geometry; and polygon meshes for three-dimensional objects. Frequently, a representation of the shape of a real-world phenomenon may have a different (usually lower) dimension than the phenomenon being represented. For example, a city (a two-dimensional region) may be represented as a point, or a road (a three-dimensional volume of material) may be represented as a line. This dimensional generalization correlates with tendencies in spatial cognition. For example, asking the distance between two cities presumes a conceptual model of the cities as points, while giving directions involving travel "up," "down," or "along" a road imply a one-dimensional conceptual model. This is frequently done for purposes of data efficiency, visual simplicity, or cognitive efficiency, and is acceptable if the distinction between the representation and the represented is understood, but can cause confusion if information users assume that the digital shape is a perfect representation of reality (i.e., believing that roads really are lines). == In 3D modelling == In CAD software or 3D modelling, the interface may present the user with the ability to create primitives which may be further modified by edits. For example, in the practice of box modelling the user will start with a cuboid, then use extrusion and other operations to create the model. In this use the primitive is just a convenient starting point, rather than the fundamental unit of modelling. A 3D package may also include a list of extended primitives which are more complex shapes that come with the package. For example, a teapot is listed as a primitive in 3D Studio Max. == In graphics hardware == Various graphics accelerators exist with hardware acceleration for rendering specific primitives such as lines or triangles, frequently with texture mapping and shaders. Modern 3D accelerators typically accept sequences of triangles as triangle strips.
Read more →
Bixby (software)

Bixby ( ) is a virtual assistant developed by Samsung Electronics that runs on various Samsung-branded appliances, primarily mobile devices but also some refrigerators televisions and PCs. It uses voice commands and a natural-language user interface to answer questions and perform tasks, while adapting to the users' preferences and behavior. Samsung first launched Bixby in 2017. Along with Bixby voice assistant, its other main component currently is Bixby Vision, a contextual and visual search augmented reality camera app. Formerly, the Bixby suite consisted of a number of other tools, but these have since been renamed, such as Bixby Routines (now Modes and Routines). == History == On 20 March 2017, Samsung announced the voice-powered digital assistant named "Bixby" as a replacement of the S Voice assistant. It was introduced alongside the Galaxy S8 and S8+ and the Galaxy Tab A (2017) during the Galaxy Unpacked 2017 event. Although released for these devices, it could also be sideloaded on older Galaxy devices running Android Nougat. Before the phone's release, the Bixby Button was reprogrammable and could be set to open other applications or assistants, such as Google Assistant. However, near the phone's release, this ability was removed with a firmware update. Remapping remained possible through third-party apps. Bixby was launched in Korean on 1 May 2017 (KST). Bixby Voice was intended to be made available in the US later that spring. However, Samsung postponed the release, as Bixby had issues understanding English. The English version was finally rolled out in July 2017, followed by a Chinese language version later that year. In October 2017, Samsung announced the release of Bixby 2.0 during its annual developer conference in San Francisco. The new version was rolled out across the company's line of connected products, including smartphones, TVs, and refrigerators. Third parties were allowed to develop applications for Bixby using the Samsung Developer Kit. In August 2018, Samsung announced the Bixby-integrated Galaxy Home smart speaker. In 2019, UX developers at Samsung stated that they intended to use AR Emoji avatars as a personified Bixby assistant. At SDC19, Samsung displayed the Galaxy Home Mini speaker, which also supported Bixby. Bixby 3.0 was released with One UI 3 at the start of 2021. With version 3.0, Home and Reminders features were separated from Bixby. In June 2021, screenshots surfaced for what some thought as a replacement for Bixby. The three-dimensional virtual assistant, Sam, was popular on social media, though it was not intended as a replacement for Bixby. Bixby launched for Microsoft Windows in October 2021, with distribution through the Microsoft Store. This version of Bixby was optimized for Samsung's Galaxy Book computers. Samsung launched an AI Bixby custom voice creator in 2023, allowing users to record their own voice commands. Most recently, in July 2024, Samsung confirmed that it plans to launch an upgraded version of Bixby later that year. This new Bixby would be powered by Samsung's proprietary large language model (LLM) technology, promising a significant boost to Bixby's capabilities with the help of generative AI. In January 2025, with the announcement of Galaxy S25 and the One UI 7 update, Bixby was no longer the default voice assistant, having been replaced by Google Gemini. Despite this, Bixby still continued to be developed and expanded by Samsung and was revamped at the same time with new AI capabilities. Samsung brought the "smarter" Bixby to Samsung televisions, allowing users to speak to their TV sets and control their homes with it. A visual refresh was planned for One UI 8.5. == Functionality == Bixby is a voice assistant developed by Samsung that provides device control, information retrieval, and task automation using voice input and artificial intelligence. It can answer contextual queries, adjust system settings, perform searches, and manage reminders or schedules. The service also personalizes responses by recognizing individual user voices. Bixby itself was also formerly called Bixby Voice to differentiate from other Bixby tools in the suite. === Bixby Vision === Bixby Vision is a visual recognition feature that analyzes images captured through the device camera and provides context-specific information or actions. It combines on-device processing with cloud-based AI resources to identify objects, detect text, and interpret scenes within supported applications. It comes pre-installed on Samsung Galaxy phones. It is considered to be the imaging component of Bixby. ==== Translate ==== Detects foreign text in the camera view and provides real-time translation by overlaying translated text on the preview. ==== Text ==== Uses optical character recognition(OCR) to extract printed or handwritten text for copying, searching, or sharing. ==== Discover ==== Identifies consumer products, fashion items, or furniture and retrieves visually similar items or related online information. ==== Wine ==== Recognizes wine labels and provides information such as variety, region of origin, average price, and reviews. ==== Scene Describer ==== Generates written and spoken descriptions of captured scenes, supporting accessibility for users with visual impairments. ==== Object Identifier ==== Identifies plants, animals, food items, or landmarks and displays corresponding names or classification details. ==== Text Reader ==== Converts detected text into spoken audio using text-to-speech functionality. ==== Color Detector ==== Identifies and names colors within the frame, displaying or reading the recognized color aloud. === Former Bixby tools === Bixby Home was a vertically scrolling home screen displaying cards of information such as weather, fitness activity, and smart home controls. It was renamed Samsung Daily with the release of One UI 2.1 in 2020, then replaced by Samsung Free in One UI 3.0. Samsung Free was eventually discontinued in some markets. Its successor, Samsung News, now functions as a news aggregation service with optional home-screen integration similar to Bixby Home. Bixby Routines was an automation feature that allowed users to create custom rules based on triggers such as time, location, or device conditions. Beginning with One UI 5.0, it was renamed Modes and Routines. Bixby Text Call, introduced in One UI 5.0 (2022) in select regions, enabled users to handle incoming calls via speech-to-text conversion and vice versa. It is now named simply Text Call and can be found in the Phone app settings. Bixby Touch allowed users to trigger context-aware actions by touching on-screen content. It analyzed images, text, and other visual elements displayed on the device and provided related options such as translation, image search, product lookup, or other content-based information. Several of its capabilities overlapped with, or were later superseded by, features offered through Bixby Vision. Other legacy components including Bixby Touch, Bixby Global Action, Bixby Dictation, and Bixby Wakeup, formed part of the early Bixby suite and have since been phased out, though exact discontinuation details vary by region. == Regions and languages == As of April 2018, Bixby is available in over 195 countries, but only in Korean, English (American), and Chinese (Mandarin). The limitation is that the models not intended for the Japanese market, like S10e, are not allowed to login to Bixby services from Japan; therefore Bixby becomes blocked. The choice of languages has since expanded: Samsung has deployed Bixby's voice command function in French, and on 20 February 2019 Samsung announced the addition of further languages: English (British), German, Italian and Spanish (Spain). On 22 February 2020, Samsung announced the addition of Portuguese (Brazil), for Galaxy S10 & Note10, in Beta, and later for other models. == Compatible devices == === Flagship series === Galaxy S series: All models since Galaxy S7 Galaxy Tab S: All models since Galaxy Tab S4 Galaxy Note: All models since Galaxy Note FE and Galaxy Note 8 Galaxy Z series: All models === Other series === Galaxy A Galaxy A6/A6+ (Bixby Home, Reminder and Vision) Galaxy A7 (2017) (available to users in South Korea only; Bixby Home and Reminder only) Galaxy A7 (2018) (Bixby Home, Reminder and Vision only) Galaxy A8 (2018) (including A8 Star; Bixby Home, Reminder and Vision only; S Voice used instead) Galaxy A8s (Bixby Home, Reminder and Vision only) Galaxy A9 (2018)/A9s/A9 Star Pro (including A9 Star and A9 Star Lite; Bixby Home, Reminder and Vision only; S Voice used instead) Galaxy A9 Pro (2019) (Bixby Home, Reminder and Vision only) Galaxy A20 (Bixby Home and Service) Galaxy A21s Galaxy A30s (Bixby Home, Vision, Reminder and Routines) Galaxy A40 (Bixby Home and Reminder) Galaxy A41 (Bixby Home, Vision, Routines and Reminder) Galaxy A50 (Bixby Home, Voice, Vision, Reminder and Routines) Galaxy A50s (Bixby Home, Voice, Vision, Reminder and Routines) G
Read more →
Paola Velardi

Paola Velardi (born in Rome, April 26, 1955) is a full professor of computer science at Sapienza University in Rome, Italy. Her research encompasses Artificial Intelligence and specifically, natural language processing, machine learning business intelligence and semantic web. Velardi is one of the hundred female scientists included in the database "100esperte.it" (translated from Italian with "100 female experts"). This online, open database champions the recognition of top-rated female scientists in Science, Technology, Engineering and Mathematics (STEM) areas. Among her prestigious appointments and honors, her inclusion stands out —alongside 45 other international female scientists from the past, present, and future— in the Women in Science pavilion of UNESCO’s Virtual Science Museum. == Research == Paola Velardi's research activity has focused, since the early 1980s, on Artificial Intelligence, with a particular emphasis on natural language processing (NLP), Machine learning, and data mining. Her scientific contributions have evolved over time, following the sector's primary paradigms: Semantic Web and Ontologies: She is known for her pioneering work on semantic disambiguation and automated ontology learning, collaborating on the development of systems such as OntoLearn. Social Computing and Predictive Analysis: She has conducted research on extracting information from social media for epidemiological monitoring (syndromic surveillance) and for the identification of opinion leaders. In the educational field, she has developed machine learning models to predict the risk of student dropout. AI for Health and Elder Monitoring: She has coordinated projects to support frailty in the elderly, developing systems based on ambient intelligence and wearables to detect clinical and behavioral anomalies. She has also contributed to models for analyzing behavioral changes through dynamic clustering. Generative AI and Finance: More recently, her research has expanded into the use of generative AI and deep learning for finance, including benchmark studies on price trend prediction based on Limit Order Books (LOB) and the development of diffusion models for realistic market simulation (the TRADES project). According to Google Scholar bibliometrics updated until December 2025, Velardi's scientific publications have been cited more than 8100 times. Her h-index was 42. She has published more than 200 papers in international journals and conference proceedings. Some of her publications have been published in top rated journals such as Artificial Intelligence, Computational Linguistics, Knowledge-Based Systems, IEEE Transactions on Data and Knowledge Engineering , IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Computers, IEEE Transactions on Software Engineering , Data Mining and Knowledge Discovery, and Journal of Web Semantics. == Education and previous employments == Velardi graduated in electronic engineering from Sapienza University in 1978. From 1978 to 1983, she worked for the Ugo Bordoni Foundation, a research institution focusing on ICT and working under the supervision of the Italian Ministry of Economic Development. In 1983, she was a visiting scholar at Stanford University. During this period she became passionate about Artificial Intelligence, which will remain her area of research throughout her career. From 1984 to 1986, she came back to her natal city and worked as a researcher for IBM. From 1986 to 1996 she was an associate professor in the engineering faculty of Polytechnic University of the Marches (Ancona, Italy). Starting in November 1996, she taught in and did research for the Department of Computer Science at the Sapienza University. Velardi was the head of Bachelor and Master Programs in Computer Science at Sapienza University from 2010 to 2013 and from 2015 to 2016. == Current employment == Since November 2001, Velardi has been a full professor in the department of computer science ("Dipartimento di Informatica" in Italian) at Sapienza University in Rome, Italy. Since 2013, she has been the coordinator of the Distance Learning Degree in Computer Science at Sapienza University. As of today, Velardi is a Senior Associate at the Institute of Cognitive Sciences and Technologies (ISTC) of the CNR. == Recognition == Velardi is one of the hundred female scientists included in the database "100esperte.it" (translated from Italian with "100 female experts"). This database lists top Italian female STEM scientists. Six out of one hundred scientists in the 100esperte's database are computer scientists like Velardi. Velardi is in the list of the top Italian scientists. A top scientist appearing in the Top-Italian-Scientists database is a scientist whose h-index is greater than 30. In March 2017, she was given an IBM Faculty Award for her research on social recommender systems. In December 2018, Velardi was included in the list of the 50 most influential Italian women in science and technology by Inspiring Fifty, a non-profit that aims to increase diversity in STEM by making female role models in tech more visible. In September 2019 she was the local co-organizer and Program Chair of the 6th ACM Celebration of Women in Computing. In November 2019 Velardi received the Standout Woman Award International at the seat of the Italian Parliament in Montecitorio. == Causes == Velardi aims at debunking the myth of computer science as a man-oriented and "inflexible" discipline. She is the founder of the project "NERD? Non e' roba per donne?" (translated from Italian: "NERD? Is it not stuff for women?"). This project was launched by Velardi in 2012 in the Department of Computer Science at Sapienza University. Since 2013 the project has been carried out in partnership with IBM Italy, which later created a spin-off of the project. The goal of the project is two-fold: (1) conveying computer science as creative, interdisciplinary and problem-solving-oriented science, and (2) encouraging young female students in studying computer science by, for instance, developing apps for smartphones. She has been the program chair of the 19th ACM celebration of Women in Computing. She is the creator and coordinator of the G4GRETA, an educational project that involves students of the third and fourth grades of Rome and Lazio. The project combines the development of IT skills with the themes of environmental sustainability and soft skills (teambuilding, pitching, social networking, etc.) Velardi is also involved in scientific dissemination. In 2020 and 2021 she cooperated with RaiCultura, the cultural division of RAI, the national broadcasting company.
Read more →
Sparse dictionary learning

Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation. One of the most important applications of sparse dictionary learning is in the field of compressed sensing or signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear measurements, provided that the signal is sparse or near-sparse. Since not all signals satisfy this condition, it is crucial to find a sparse representation of that signal such as the wavelet transform or the directional gradient of a rasterized matrix. Once a matrix or a high-dimensional vector is transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP, or fast non-iterative algorithms can be used to recover the signal. One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data. The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing, one typically wants to represent the input data using a minimal amount of components. Before this approach, the general practice was to use predefined dictionaries such as Fourier or wavelet transforms. However, in certain cases, a dictionary that is trained to fit the input data can significantly improve the sparsity, which has applications in data decomposition, compression, and analysis, and has been used in the fields of image denoising and classification, and video and audio processing. Sparsity and overcomplete dictionaries have immense applications in image compression, image fusion, and inpainting. == Problem statement == Given the input dataset X = [ x 1 , . . . , x K ] , x i ∈ R d {\displaystyle X=[x_{1},...,x_{K}],x_{i}\in \mathbb {R} ^{d}} we wish to find a dictionary D ∈ R d × n : D = [ d 1 , . . . , d n ] {\displaystyle \mathbf {D} \in \mathbb {R} ^{d\times n}:D=[d_{1},...,d_{n}]} and a representation R = [ r 1 , . . . , r K ] , r i ∈ R n {\displaystyle R=[r_{1},...,r_{K}],r_{i}\in \mathbb {R} ^{n}} such that both ‖ X − D R ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}} is minimized and the representations r i {\displaystyle r_{i}} are sparse enough. This can be formulated as the following optimization problem: argmin D ∈ C , r i ∈ R n ∑ i = 1 K ‖ x i − D r i ‖ 2 2 + λ ‖ r i ‖ 0 {\displaystyle {\underset {\mathbf {D} \in {\mathcal {C}},r_{i}\in \mathbb {R} ^{n}}{\text{argmin}}}\sum _{i=1}^{K}\|x_{i}-\mathbf {D} r_{i}\|_{2}^{2}+\lambda \|r_{i}\|_{0}} , where C ≡ { D ∈ R d × n : ‖ d i ‖ 2 ≤ 1 ∀ i = 1 , . . . , n } {\displaystyle {\mathcal {C}}\equiv \{\mathbf {D} \in \mathbb {R} ^{d\times n}:\|d_{i}\|_{2}\leq 1\,\,\forall i=1,...,n\}} , λ > 0 {\displaystyle \lambda >0} C {\displaystyle {\mathcal {C}}} is required to constrain D {\displaystyle \mathbf {D} } so that its atoms would not reach arbitrarily high values allowing for arbitrarily low (but non-zero) values of r i {\displaystyle r_{i}} . λ {\displaystyle \lambda } controls the trade off between the sparsity and the minimization error. The minimization problem above is not convex because of the ℓ0-"norm" and solving this problem is NP-hard. In some cases L1-norm is known to ensure sparsity and so the above becomes a convex optimization problem with respect to each of the variables D {\displaystyle \mathbf {D} } and R {\displaystyle \mathbf {R} } when the other one is fixed, but it is not jointly convex in ( D , R ) {\displaystyle (\mathbf {D} ,\mathbf {R} )} . === Properties of the dictionary === The dictionary D {\displaystyle \mathbf {D} } defined above can be "undercomplete" if n < d {\displaystyle n d {\displaystyle n>d} with the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete dictionary does not provide any improvement from a representational point of view and thus isn't considered. Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional space. This case is strongly related to dimensionality reduction and techniques like principal component analysis which require atoms d 1 , . . . , d n {\displaystyle d_{1},...,d_{n}} to be orthogonal. The choice of these subspaces is crucial for efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analysis or classification. However, their main downside is limiting the choice of atoms. Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis anyway) thus allowing for more flexible dictionaries and richer data representations. An overcomplete dictionary which allows for sparse representation of signal can be a famous transform matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of giving sparser solutions as compared to predefined transform matrices. == Algorithms == As the optimization problem described above can be solved as a convex problem with respect to either dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and then the other. The problem of finding an optimal sparse coding R {\displaystyle R} with a given dictionary D {\displaystyle \mathbf {D} } is known as sparse approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below. === Method of optimal directions (MOD) === The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse dictionary learning problem. The core idea of it is to solve the minimization problem subject to the limited number of non-zero components of the representation vector: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T} Here, F {\displaystyle F} denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method such as matching pursuit and updating the dictionary by computing the analytical solution of the problem given by D = X R + {\displaystyle \mathbf {D} =XR^{+}} where R + {\displaystyle R^{+}} is a Moore-Penrose pseudoinverse. After this update D {\displaystyle \mathbf {D} } is renormalized to fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence (or until a sufficiently small residue). MOD has proved to be a very efficient method for low-dimensional input data X {\displaystyle X} requiring just a few iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the development of other dictionary learning methods. === K-SVD === K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and basically is a generalization of K-means. It enforces that each element of the input data x i {\displaystyle x_{i}} is encoded by a linear combination of not more than T 0 {\displaystyle T_{0}} elements in a way identical to the MOD approach: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T 0 {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T_{0}} This algorithm's essence is to first fix the dictionary, find the best possible R {\displaystyle R} under the above constraint (using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary D {\displaystyle \mathbf {D} } in the following manner: ‖ X − D R ‖ F 2 = | X − ∑ i = 1 K d i x T i | F 2 = ‖ E k − d k x T k ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}=\left|X-\sum _{i=1}^{K}d_{i}x_{T}^{i}\right|_{F}^{2}=\|E_{k}-d_{k}x_{T}^{k}\|_{F}^{2}} The next steps of the algorithm include rank-1 approximation of the residual matrix E k {\displaystyle E_{k}} , updating d k {\displaystyle d_{k}} and enforcing the s
Read more →
Retrieval-augmented generation

Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave
or base64 encoded elements
Read more →

How to Choose an AI Writing Assistant

Comparing the best AI writing assistant? An AI writing assistant is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI writing assistant slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.
Read more →

AI Sales Assistants: Free vs Paid (2026)

Trying to pick the best AI sales assistant? An AI sales assistant is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI sales assistant slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.
Read more →

1 2 3 4 5 6 7 8 9 10
All Categories

AI News and Guides
AI for Business
AI Image Generators
AI Writing Tools
AI Video Tools
AI Chatbots and Assistants
AI Coding Tools
Trending Guides

Convolution
DeepL Translator
Pascale Fung
Stefano Soatto
Kuki AI
Writer invariant
Top 10 AI Video Generators Compared (2026)
Stephen Muggleton
Eugene Goostman
Hanna Hajishirzi
Aizhi
Hand-picked AI tools, generators and practical how-to guides — independent reviews, updated for 2026.

Categories

AI Coding Tools
AI News and Guides
AI Chatbots and Assistants
AI for Business
AI Writing Tools
AI Image Generators
AI Video Tools

Site

Home

XML Sitemap

© Aizhi. All rights reserved.