T-norm fuzzy logics

T-norm fuzzy logics are a family of non-classical logics, informally delimited by having a semantics that takes the real unit interval [0, 1] for the system of truth values and functions called t-norms for permissible interpretations of conjunction. They are mainly used in applied fuzzy logic and fuzzy set theory as a theoretical basis for approximate reasoning. T-norm fuzzy logics belong in broader classes of fuzzy logics and many-valued logics. In order to generate a well-behaved implication, the t-norms are usually required to be left-continuous; logics of left-continuous t-norms further belong in the class of substructural logics, among which they are marked with the validity of the law of prelinearity, (A → B) ∨ (B → A). Both propositional and first-order (or higher-order) t-norm fuzzy logics, as well as their expansions by modal and other operators, are studied. Logics that restrict the t-norm semantics to a subset of the real unit interval (for example, finitely valued Łukasiewicz logics) are usually included in the class as well. Important examples of t-norm fuzzy logics are monoidal t-norm logic (MTL) of all left-continuous t-norms, basic logic (BL) of all continuous t-norms, product fuzzy logic of the product t-norm, or the nilpotent minimum logic of the nilpotent minimum t-norm. Some independently motivated logics belong among t-norm fuzzy logics, too, for example Łukasiewicz logic (which is the logic of the Łukasiewicz t-norm) or Gödel–Dummett logic (which is the logic of the minimum t-norm). == Motivation == As members of the family of fuzzy logics, t-norm fuzzy logics primarily aim at generalizing classical two-valued logic by admitting intermediary truth values between 1 (truth) and 0 (falsity) representing degrees of truth of propositions. The degrees are assumed to be real numbers from the unit interval [0, 1]. In propositional t-norm fuzzy logics, propositional connectives are stipulated to be truth-functional, that is, the truth value of a complex proposition formed by a propositional connective from some constituent propositions is a function (called the truth function of the connective) of the truth values of the constituent propositions. The truth functions operate on the set of truth degrees (in the standard semantics, on the [0, 1] interval); thus the truth function of an n-ary propositional connective c is a function Fc: [0, 1]n → [0, 1]. Truth functions generalize truth tables of propositional connectives known from classical logic to operate on the larger system of truth values. T-norm fuzzy logics impose certain natural constraints on the truth function of conjunction. The truth function ∗ : [ 0 , 1 ] 2 → [ 0 , 1 ] {\displaystyle \colon [0,1]^{2}\to [0,1]} of conjunction is assumed to satisfy the following conditions: Commutativity, that is, x ∗ y = y ∗ x {\displaystyle xy=yx} for all x and y in [0, 1]. This expresses the assumption that the order of fuzzy propositions is immaterial in conjunction, even if intermediary truth degrees are admitted. Associativity, that is, ( x ∗ y ) ∗ z = x ∗ ( y ∗ z ) {\displaystyle (xy)z=x(yz)} for all x, y, and z in [0, 1]. This expresses the assumption that the order of performing conjunction is immaterial, even if intermediary truth degrees are admitted. Monotony, that is, if x ≤ y {\displaystyle x\leq y} then x ∗ z ≤ y ∗ z {\displaystyle xz\leq yz} for all x, y, and z in [0, 1]. This expresses the assumption that increasing the truth degree of a conjunct should not decrease the truth degree of the conjunction. Neutrality of 1, that is, 1 ∗ x = x {\displaystyle 1x=x} for all x in [0, 1]. This assumption corresponds to regarding the truth degree 1 as full truth, conjunction with which does not decrease the truth value of the other conjunct. Together with the previous conditions this condition ensures that also 0 ∗ x = 0 {\displaystyle 0x=0} for all x in [0, 1], which corresponds to regarding the truth degree 0 as full falsity, conjunction with which is always fully false. Continuity of the function ∗ {\displaystyle } (the previous conditions reduce this requirement to the continuity in either argument). Informally this expresses the assumption that microscopic changes of the truth degrees of conjuncts should not result in a macroscopic change of the truth degree of their conjunction. This condition, among other things, ensures a good behavior of (residual) implication derived from conjunction; to ensure the good behavior, however, left-continuity (in either argument) of the function ∗ {\displaystyle } is sufficient. In general t-norm fuzzy logics, therefore, only left-continuity of ∗ {\displaystyle } is required, which expresses the assumption that a microscopic decrease of the truth degree of a conjunct should not macroscopically decrease the truth degree of conjunction. These assumptions make the truth function of conjunction a left-continuous t-norm, which explains the name of the family of fuzzy logics (t-norm based). Particular logics of the family can make further assumptions about the behavior of conjunction (for example, Gödel–Dummett logic requires its idempotence) or other connectives (for example, the logic IMTL (involutive monoidal t-norm logic) requires the involutiveness of negation). All left-continuous t-norms ∗ {\displaystyle } have a unique residuum, that is, a binary function ⇒ {\displaystyle \Rightarrow } such that for all x, y, and z in [0, 1], x ∗ y ≤ z {\displaystyle xy\leq z} if and only if x ≤ y ⇒ z . {\displaystyle x\leq y\Rightarrow z.} The residuum of a left-continuous t-norm can explicitly be defined as ( x ⇒ y ) = sup { z ∣ z ∗ x ≤ y } . {\displaystyle (x\Rightarrow y)=\sup\{z\mid zx\leq y\}.} This ensures that the residuum is the pointwise largest function such that for all x and y, x ∗ ( x ⇒ y ) ≤ y . {\displaystyle x(x\Rightarrow y)\leq y.} The latter can be interpreted as a fuzzy version of the modus ponens rule of inference. The residuum of a left-continuous t-norm thus can be characterized as the weakest function that makes the fuzzy modus ponens valid, which makes it a suitable truth function for implication in fuzzy logic. Left-continuity of the t-norm is the necessary and sufficient condition for this relationship between a t-norm conjunction and its residual implication to hold. Truth functions of further propositional connectives can be defined by means of the t-norm and its residuum, for instance the residual negation ¬ x = ( x ⇒ 0 ) {\displaystyle \neg x=(x\Rightarrow 0)} or bi-residual equivalence x ⇔ y = ( x ⇒ y ) ∗ ( y ⇒ x ) . {\displaystyle x\Leftrightarrow y=(x\Rightarrow y)(y\Rightarrow x).} Truth functions of propositional connectives may also be introduced by additional definitions: the most usual ones are the minimum (which plays a role of another conjunctive connective), the maximum (which plays a role of a disjunctive connective), or the Baaz Delta operator, defined in [0, 1] as Δ x = 1 {\displaystyle \Delta x=1} if x = 1 {\displaystyle x=1} and Δ x = 0 {\displaystyle \Delta x=0} otherwise. In this way, a left-continuous t-norm, its residuum, and the truth functions of additional propositional connectives determine the truth values of complex propositional formulae in [0, 1]. Formulae that always evaluate to 1 are called tautologies with respect to the given left-continuous t-norm ∗ , {\displaystyle ,} or ∗ - {\displaystyle {\mbox{-}}} tautologies. The set of all ∗ - {\displaystyle {\mbox{-}}} tautologies is called the logic of the t-norm ∗ , {\displaystyle ,} as these formulae represent the laws of fuzzy logic (determined by the t-norm) that hold (to degree 1) regardless of the truth degrees of atomic formulae. Some formulae are tautologies with respect to a larger class of left-continuous t-norms; the set of such formulae is called the logic of the class. Important t-norm logics are the logics of particular t-norms or classes of t-norms, for example: Łukasiewicz logic is the logic of the Łukasiewicz t-norm x ∗ y = max ( x + y − 1 , 0 ) {\displaystyle xy=\max(x+y-1,0)} Gödel–Dummett logic is the logic of the minimum t-norm x ∗ y = min ( x , y ) {\displaystyle xy=\min(x,y)} Product fuzzy logic is the logic of the product t-norm x ∗ y = x ⋅ y {\displaystyle xy=x\cdot y} Monoidal t-norm logic MTL is the logic of (the class of) all left-continuous t-norms Basic fuzzy logic BL is the logic of (the class of) all continuous t-norms It turns out that many logics of particular t-norms and classes of t-norms are axiomatizable. The completeness theorem of the axiomatic system with respect to the corresponding t-norm semantics on [0, 1] is then called the standard completeness of the logic. Besides the standard real-valued semantics on [0, 1], the logics are sound and complete with respect to general algebraic semantics, formed by suitable classes of prelinear commutative bounded integral residuated lattices. == History == Some particular t-norm fuzzy logics have been introduced and investigated long before the family was re

PropBank

PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer et al., the term propbank is also coming to be used as a common noun referring to any corpus that has been annotated with propositions and their arguments. The PropBank project has played a role in research in natural language processing, and has been used in semantic role labelling. == Comparison == PropBank differs from FrameNet, the resource to which it is most frequently compared, in several ways. PropBank is a verb-oriented resource, while FrameNet is centered on the more abstract notion of frames, which generalizes descriptions across similar verbs (e.g. "describe" and "characterize") as well as nouns and other words (e.g. "description"). PropBank does not annotate events or states of affairs described using nouns. PropBank commits to annotating all verbs in a corpus, whereas the FrameNet project chooses sets of example sentences from a large corpus and only in a few cases has annotated longer continuous stretches of text. PropBank-style annotations often remain close to the syntactic level, while FrameNet-style annotations are sometimes more semantically motivated. From the start, PropBank was developed with the idea of serving as training data for machine learning-based semantic role labeling systems in mind. It requires that all arguments to a verb be syntactic constituents and different senses of a word are only distinguished if the differences bear on the arguments. Due to such differences, semantic role labeling with respect to PropBank is often a somewhat easier task than producing FrameNet-style annotations.

Corpus linguistics

Corpus linguistics is an empirical method for the study of language by text corpus (plural corpora). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner. The text-corpus method uses the body of texts in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have been increasingly used to compile dictionaries (starting with The American Heritage Dictionary of the English Language in 1969) and reference grammars, with A Comprehensive Grammar of the English Language, published in 1985, as a first. Experts in the field have differing views about the annotation of a corpus. These views range from John McHardy Sinclair, who advocates minimal annotation so texts speak for themselves, to the Survey of English Usage team (University College, London), who advocate annotation as allowing greater linguistic understanding through rigorous recording. == History == Some of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Prātiśākhya literature described the sound patterns of Sanskrit as found in the Vedas, and Pāṇini's grammar of classical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic grammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepared concordances to allow detailed study of the language of the Bible and other canonical texts. === English corpora === A landmark in modern corpus linguistics was the publication of Computational Analysis of Present-Day American English in 1967. Written by Henry Kučera and W. Nelson Francis, the work was based on an analysis of the Brown Corpus, which is a structured and balanced corpus of one million words of American English from the year 1961. The corpus comprises 2000 text samples, from a variety of genres. The Brown Corpus was the first computerized corpus designed for linguistic research. Kučera and Francis subjected the Brown Corpus to a variety of computational analyses and then combined elements of linguistics, language teaching, psychology, statistics, and sociology to create a rich and variegated opus. A further key publication was Randolph Quirk's "Towards a description of English Usage" in 1960 in which he introduced the Survey of English Usage. Quirk's corpus was the first modern corpus to be built with the purpose of representing the whole language. Shortly thereafter, Boston publisher Houghton-Mifflin approached Kučera to supply a million-word, three-line citation base for its new American Heritage Dictionary, the first dictionary compiled using corpus linguistics. The AHD took the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used). Other publishers followed suit. The British publisher Collins' COBUILD monolingual learner's dictionary, designed for users learning English as a foreign language, was compiled using the Bank of English. The Survey of English Usage Corpus was used in the development of one of the most important Corpus-based Grammars, which was written by Quirk et al. and published in 1985 as A Comprehensive Grammar of the English Language. The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), Australian Corpus of English (Australian English), the Frown Corpus (early 1990s American English), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and the British National Corpus, a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library. For contemporary American English, work has stalled on the American National Corpus, but the 400+ million word Corpus of Contemporary American English (1990–present) is now available through a web interface. The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project, containing one million words, which inspired Shana Poplack's much larger corpus of spoken French in the Ottawa-Hull area. === Multilingual corpora === In the 1990s, many of the notable early successes on statistical methods in natural-language programming (NLP) occurred in the field of machine translation, due especially to work at IBM Research. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. There are corpora in non-European languages as well. For example, the National Institute for Japanese Language and Linguistics in Japan has built a number of corpora of spoken and written Japanese. Sign language corpora have also been created using video data. === Ancient languages corpora === Besides these corpora of living languages, computerized corpora have also been made of collections of texts in ancient languages. An example is the Andersen-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields of information. The Quranic Arabic Corpus is an annotated corpus for the Classical Arabic language of the Quran. This is a recent project with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The Digital Corpus of Sanskrit (DCS) is a "Sandhi-split corpus of Sanskrit texts with full morphological and lexical analysis... designed for text-historical research in Sanskrit linguistics and philology." === Corpora from specific fields === Besides pure linguistic inquiry, researchers had begun to apply corpus linguistics to other academic and professional fields, such as the emerging sub-discipline of Law and Corpus Linguistics, which seeks to understand legal texts using corpus data and tools. The DBLP Discovery Dataset concentrates on computer science, containing relevant computer science publications with sentient metadata such as author affiliations, citations, or study fields. A more focused dataset was introduced by NLP Scholar, a combination of papers of the ACL Anthology and Google Scholar metadata. Corpora can also aid in translation efforts or in teaching foreign languages. == Methods == Corpus linguistics has generated a number of research methods, which attempt to trace a path from data to theory. Wallis and Nelson (2001) first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis. Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers. Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search. The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus (through corpus managers). Linguists with other interests and differing perspectives than the originators' can exploit this work. By sharing data

Struc2vec

struc2vec is a framework to generate node vector representations on a graph that preserve the structural identity. In contrast to node2vec, that optimizes node embeddings so that nearby nodes in the graph have similar embedding, struc2vec captures the roles of nodes in a graph, even if structurally similar nodes are far apart in the graph. It learns low-dimensional representations for nodes in a graph, generating random walks through a constructed multi-layer graph starting at each graph node. It is useful for machine learning applications where the downstream application is more related with the structural equivalence of the nodes (e.g., it can be used to detect nodes in networks with similar functions, such as interns in the social network of a corporation). struc2vec identifies nodes that play a similar role based solely on the structure of the graph, for example computing the structural identity of individuals in social networks. In particular, struc2vec employs a degree-based method to measure the pairwise structural role similarity, which is then adopted to build the multi-layer graph. Moreover, the distance between the latent representation of nodes is strongly correlated to their structural similarity. The framework contains three optimizations: reducing the length of degree sequences considered, reducing the number of pairwise similarity calculations, and reducing the number of layers in the generated graph. struc2vec follows the intuition that random walks through a graph can be treated as sentences in a corpus. Each node in a graph is treated as an individual word, and short random walk is treated as a sentence. In its final phase, the algorithm employs Gensim's word2vec algorithm to learn embeddings based on biased random walks. Sequences of nodes are fed into a skip-gram or continuous bag of words model and traditional machine-learning techniques for classification can be used. It is considered a useful framework to learn node embeddings based on structural equivalence.

Electronic sell-through

Electronic sell-through (EST) is a method of media distribution whereby consumers pay a one-time fee to download a media file for storage on a hard drive. Although EST is often described as a transaction that grants content "ownership" to the consumer, the content may become unusable after a certain period and may not be viewable using competing platforms. EST is used by a wide array of digital media products, including movies, television, music, games, and mobile applications. The term is sometimes used interchangeably with download to own (DTO). == Film and television == The film and television industry's $18.8 billion home entertainment market consists of rental and sell-through segments, the latter of which includes the electronic sell-through of digital content. In 2010, EST generated $683 million of total home entertainment revenues, putting it behind the more lucrative revenue streams of cable video-on-demand (VOD) and internet video-on-demand (iVOD), which brought in a combined $1.8 billion in the same period. In 2010, Apple's iTunes Store accounted for three quarters of the U.S. EST business. The rest of the EST market was captured by Microsoft (via its Zune Video platform), Sony, Amazon VOD (now Amazon Video), and Walmart (via its VUDU service). A number of industry trends indicate the future expansion of EST's share of digital distribution revenues. David Bishop, worldwide president of Sony Pictures Home Entertainment, describes the following outlook: "With the launch of UltraViolet (the cloud-based digital copy locker system) establishing a common digital distribution platform later this year, prices potentially coming down on digital sales, more marketing devoted to digital sellthrough, and studios adding more value to the sellthrough product by making HD available and building in smarter extra features, we see the balance tilting even more toward owning and collecting digital movies."

Tree transducer

In theoretical computer science and formal language theory, a tree transducer (TT) is an abstract machine taking as input a tree, and generating output – generally other trees, but models producing words or other structures exist. Roughly speaking, tree transducers extend tree automata in the same way that word transducers extend word automata. Manipulating tree structures instead of words enable TT to model syntax-directed transformations of formal or natural languages. However, TT are not as well-behaved as their word counterparts in terms of algorithmic complexity, closure properties, etcetera. In particular, most of the main classes are not closed under composition. The main classes of tree transducers are: == Top-Down Tree Transducers (TOP) == A TOP T is a tuple (Q, Σ, Γ, I, δ) such that: Q is a finite set, the set of states; Σ is a finite ranked alphabet, called the input alphabet; Γ is a finite ranked alphabet, called the output alphabet; I is a subset of Q, the set of initial states; and δ is a set of rules of the form q ( f ( x 1 , … , x n ) ) → u {\displaystyle q(f(x_{1},\dots ,x_{n}))\to u} , where f is a symbol of Σ, n is the arity of f, q is a state, and u is a tree on Γ and Q × 1.. n {\displaystyle Q\times 1..n} , such pairs being nullary. === Examples of rules and intuitions on semantics === For instance, q ( f ( x 1 , … , x 3 ) ) → g ( a , q ′ ( x 1 ) , h ( q ″ ( x 3 ) ) ) {\displaystyle q(f(x_{1},\dots ,x_{3}))\to g(a,q'(x_{1}),h(q''(x_{3})))} is a rule – one customarily writes q ( x i ) {\displaystyle q(x_{i})} instead of the pair ( q , x i ) {\displaystyle (q,x_{i})} – and its intuitive semantics is that, under the action of q, a tree with f at the root and three children is transformed into g ( a , q ′ ( x 1 ) , h ( q ″ ( x 3 ) ) ) {\displaystyle g(a,q'(x_{1}),h(q''(x_{3})))} where, recursively, q ′ ( x 1 ) {\displaystyle q'(x_{1})} and q ″ ( x 3 ) {\displaystyle q''(x_{3})} are replaced, respectively, with the application of q ′ {\displaystyle q'} on the first child and with the application of q ″ {\displaystyle q''} on the third. === Semantics as term rewriting === The semantics of each state of the transducer T, and of T itself, is a binary relation between input trees (on Σ) and output trees (on Γ). A way of defining the semantics formally is to see δ {\displaystyle \delta } as a term rewriting system, provided that in the right-hand sides the calls are written in the form q ( x i ) {\displaystyle q(x_{i})} , where states q are unary symbols. Then the semantics [ [ q ] ] {\displaystyle [\![q]\!]} of a state q is given by [ [ q ] ] = { u ↦ v ∣ u is a tree on Σ , v is a tree on Γ , and q ( u ) → δ ∗ v } . {\displaystyle [\![q]\!]=\{u\mapsto v\mid u{\text{ is a tree on }}\Sigma ,\ v{\text{ is a tree on }}\Gamma {\text{, and }}q(u)\to _{\delta }^{}v\}.} The semantics of T is then defined as the union of the semantics of its initial states: [ [ T ] ] = ⋃ q ∈ I [ [ q ] ] . {\displaystyle [\![T]\!]=\bigcup _{q\in I}[\![q]\!].} === Determinism and domain === As with tree automata, a TOP is said to be deterministic (abbreviated DTOP) if no two rules of δ share the same left-hand side, and there is at most one initial state. In that case, the semantics of the DTOP is a partial function from input trees (on Σ) to output trees (on Γ), as are the semantics of each of the DTOP's states. The domain of a transducer is the domain of its semantics. Likewise, the image of a transducer is the image of its semantics. === Properties of DTOP === DTOP are not closed under union: this is already the case for deterministic word transducers. The domain of a DTOP is a regular tree language. Furthermore, the domain is recognisable by a deterministic top-down tree automaton (DTTA) of size at most exponential in that of the initial DTOP. That the domain is DTTA-recognizable is not surprising, considering that the left-hand sides of DTOP rules are the same as for DTTA. As for the reason for the exponential explosion in the worst case (that does not exist in the word case), consider the rule q ( f ( x 1 , x 2 ) ) → g ( p 1 ( x 1 ) , p 2 ( x 1 ) , p 3 ( x 2 ) ) {\displaystyle q(f(x_{1},x_{2}))\to g(p_{1}(x_{1}),p_{2}(x_{1}),p_{3}(x_{2}))} . In order for the computation to succeed, it must succeed for both children. That means that the right child must be in the domain of p 3 {\displaystyle p_{3}} . As for the left child, it must be in the domain of both p 1 {\displaystyle p_{1}} and p 2 {\displaystyle p_{2}} . Generally, since subtrees can be copied, a single subtree can be evaluated by multiple states during a run, despite the determinism, and unlike DTTA. Thus the construction of the DTTA recognising the domain of a DTOP must account for sets of states and compute the intersections of their domains, hence the exponential. In the special case of linear DTOP, that is to say DTOP where each x i {\displaystyle x_{i}} appears at most once in the right-hand side of each rule, the construction is linear in time and space. The image of a DTOP is not a regular tree language. Consider the transducer coding the transformation f ( x ) → g ( x , x ) {\displaystyle f(x)\to g(x,x)} ; that is, duplicate the child of the input. This is easily done by a rule q ( f ( x 1 ) ) → g ( p ( x 1 ) , p ( x 1 ) ) {\displaystyle q(f(x_{1}))\to g(p(x_{1}),p(x_{1}))} , where p encodes the identity. Then, absent any restrictions on the first child of the input, the image is a classical non-regular tree language. However, the domain of a DTOP cannot be restricted to a regular tree language. That is to say, given a DTOP T and a language L, one cannot in general build a DTOP T ′ {\displaystyle T'} such that the semantics of T ′ {\displaystyle T'} is that of T, restricted to L. This property is linked to the reason deterministic top-down tree automata are less expressive than bottom-up automata: once you go down a given path, information from other paths is inaccessible. Consider the transducer coding the transformation f ( x , y ) → y {\displaystyle f(x,y)\to y} ; that is, output the right child of the input. This is easily done by a rule q ( f ( x 1 , x 2 ) ) → p ( x 2 ) {\displaystyle q(f(x_{1},x_{2}))\to p(x_{2})} , where p encodes the identity. Now let's say we want to restrict this transducer to the finite (and thus, in particular, regular) domain { f ( c , a ) , f ( c , b ) } {\displaystyle \{f(c,a),\ f(c,b)\}} . We must use the rules q ( f ( x 1 , x 2 ) ) → p ( x 2 ) , p ( a ) → a , p ( b ) → b {\displaystyle q(f(x_{1},x_{2}))\to p(x_{2}),\ p(a)\to a,\ p(b)\to b} . But in the first rule, x 1 {\displaystyle x_{1}} does not appear at all, since nothing is produced from the left child. Thus, it is not possible to test that the left child is c. In contrast, since we produce from the right child, we can test that it is a or b. In general, the criterion is that DTOP cannot test properties of subtrees from which they do not produce output. DTOP are not closed under composition. However this problem can be solved by the addition of a lookahead: a tree automaton, coupled to the transducer, that can perform tests on the domain which the transducer is incapable of. This follows from the point about domain restriction: composing the DTOP encoding identity on { f ( c , a ) , f ( c , b ) } {\displaystyle \{f(c,a),\ f(c,b)\}} with the one encoding f ( x , y ) → y {\displaystyle f(x,y)\to y} must yield a transducer with the semantics { f ( c , a ) ↦ a , f ( c , b ) ↦ b } {\displaystyle \{f(c,a)\mapsto a,\ f(c,b)\mapsto b\}} , which we know is not expressible by a DTOP. The typechecking problem—testing whether the image of a regular tree language is included in another regular tree language—is decidable. The equivalence problem—testing whether two DTOP define the same functions—is decidable. == Bottom-Up Tree Transducers (BOT) == As in the simpler case of tree automata, bottom-up tree transducers are defined similarly to their top-down counterparts, but proceed from the leaves of the tree to the root, instead of from the root to the leaves. Thus the main difference is in the form of the rules, which are of the form f ( q 1 ( x 1 ) , … , q n ( x n ) ) → q ( u ) {\displaystyle f(q_{1}(x_{1}),\dots ,q_{n}(x_{n}))\to q(u)} .

PropBank

Top 10 AI Video Editors Compared (2026)

Corpus linguistics

Struc2vec

Electronic sell-through

Tree transducer

Related Articles