Toric grammars and communication models

Get Complete Project Material File(s) Now! »

A roadmap towards a communication model

We will use toric grammars as intermediate steps to define the transition probabilities of our communication model on texts. To this purpose, we will first introduce some general types of transformations on toric grammars (reminding the reader that in our formalism texts are some special subset of toric grammars). It will turn out that two types of expressions, global expressions and local expressions, will play different roles. Let us define them respectively as Eg = E \ S [0 S+ , E` = E \ S [+ S+ , we remind that [+ = n [i, i 2 N\ {0} o and S+ = S1 j=1 Sj . Roughly speaking, global expressions correspond to full sentences, whereas local expressions correspond to partial constituents. Any toric grammar G 2 G can be accordingly decomposed into G = Gg +G`, where Gg(A) = G A\ Eg and G`(A) = G A\ E` , for any subset A E . The transitions of the communication chain with kernel q(T ,T 0) will be defined in two steps. The first step consists in learning from the text T a toric grammar G . To this purpose we will split the sentences of T into syntactic constituents.
The second step consists in merging the constituents again to produce a random new text T 0. The parameter = R of the communication kernel q, will also be a toric grammar. The role of this reference grammar R will be to provide a stock of local expressions to be used when computing G from T . We will discuss later the question of the estimation of R itself, (and bring satisfactory statistical answers to this issue only in the forthcoming chapters). For the time being, we will assume that the reference grammar R is a parameter of the communication chain, known to all involved speakers.
We could have defined a communication kernel q b R(T )(T ,T 0), where the reference grammar c R(T ) itself is estimated at each step from the current text T , but we would have obtained a model with weaker properties, where, in particular, all the states are not necessarily recurrent states. On the other hand, the proof that the reachable set from any starting point is finite still holds for this modified model, so that it does provide an alternative way of defining a language model as described in the introduction. We will still need an estimator c R(T ) of the reference grammar, in order to provide a language estimator bQ b R(T ),T 0 , where we are using the notations of eq. (1.1.1) on page 18. We will propose a first tentative method for the estimation c R(T ) of the reference grammar in this chapter, and study this question in more details in the two following ones.

Non stochastic syntax splitting and merging

Let us now describe the model, starting with the description of some non random grammar transformations. We already introduced a model for grammars that includes texts as a special case. We have now to describe how to generate a toric grammar from a text, with, or without, the help of a reference grammar to learn the local component of the grammar. The mechanism producing a grammar from a text will be some sort of random parse algorithm (or rather tentative parse algorithm).
All of this will be achieved by two transformations on toric grammars that will respectively split and merge expressions (syntagms) of a toric grammar into smaller or bigger ones. We will first describe the sets of possible splits and merges from a given grammar. This will serve as a basis to define random transitions from one grammar to another in subsequent sections

Random split and merge processes

The grammars we described so far are obtained using splitting rules. Texts can be reconstructed using merge transformations. The splitting rules as well as the merge rules allow for multiple choices at each step. We will account for this by introducing random processes where these choices are made at random. We will describe two types of random grammar transformations. Each of these will appear as a finite length Markov chain, where the length of the chain is given by a uniformly bounded stopping time.
− The learning process (or splitting process) will start with a text and build a grammar through iterated splits.
− the production process will start with a grammar and produce a text through iterated merge operations.
These two types of processes may be combined into a split and merge process, going back and forth between texts and toric grammars.
Let us give more formal definitions. Learning and parsing processes will be some special kinds of splitting processes, to be defined hereafter.
Definition 1.3.4 (Splitting process) Given some restricted splitting rule r : G ! 2G from the set of grammars to the set of subsets of G, such that for any G 2 G, r(G ) (G ), a splitting process is a time homogeneous stopped Markov chain St, 0 6 t 6 defined on G such that = inf n t 2 N : r(St) = ? o , P St = G 0 | St−1 = G > 0 () G 0 2 r(G ).

Expectation of a random toric grammar

In section 1.4 on page 35, given some text T 2 T, we defined a random distribution on toric grammars GT that we would like to use to learn a grammar from a text. The most obvious way to do this is to draw a toric grammar at random according to the distribution GT , and we already saw an algorithm, described by a Markov chain and a stopping time, to do this.
The distribution GT will be spread in general on many grammars. This is a kind of instability that we would like to avoid, if possible. A natural way to get rid of this instability would be to simulate the expectation of GT . To do this, we are facing a problem: the usual definition of the expectation of GT , that is Z G dGT (G ), although well defined from a mathemacial point of view, is a meaningless toric grammar, due to the possible fluctuations of the label mapping. To get a meaningful notion of expectation, we need to define in a meaningful way the sum of two toric grammars. We will achieve this in two steps.
Let us introduce first the disjoint sum of two toric grammars. We will do this with the help of two disjoint label maps. Let us define the even and odd label maps fe and fo as fe(i) = 2i, fo(i) = max{0, 2i − 1}, i 2 N.

Basic properties of Markov substitute sets

Any one point set {y}, where y 2 D+ satisfies eq. (2.1.1) on page 64, is a Markov substitute set, whose substitute measure is the Dirac mass at y. A subset of a Markov substitute set is itself a Markov substitute set. If B and C are Markov substitute sets such that B \C 6= ?, then B [C is also a Markov substitute set.
As such, a set B is a Markov substitute set if and only if, for any y, y0 2 B, the pair {y, y0} is a Markov substitute pair.
Less restrictively, a set B is a Markov substitute set if and only if there is a connected undirected spanning graph G B2 such that for any (y, y0) 2 G , {y, y0} is a Markov substitute pair.
This is obvious from the characterization by eq. (2.1.2) on page 64 for the first point and eq. (2.1.3) on page 64 for the others. These properties lead us to define the relation y S y0 () {y, y0} is a Markov substitute pair.

Metropolis invariant dynamics on sentences

We will here have a first go at describing invariant dynamics on sentences, related to the Markov substitute property.
Let us consider a family Bj , 1 6 j 6 t of Markov substitute sets, and let us assume that we know for each Bi the substitute measure qBi of a Markov substitute set Bi containing Bi. We do not need necessarily to know qBi itself, since we can deduce it from the relation qBi(y) = qBi(y)/qBi(Bi)1(y 2 Bi). We will locate a member of a Markov substitute set in the sentence and replace it by another element of the same Markov substitute set, according to the relevant substitute measure. Let us remark that we accordingly will only use the substitute measures qBi such that X x2(D)2 E (S, x,Bi, i) > 0.

Table of contents :

Introduction
0.1 Context
0.1.1 Natural language processing
0.1.2 Remarks on the shortcomings of standard models
0.1.3 Syntactic analysis
0.1.4 Markov substitute processes and toric grammars in a few words
0.2 Overview of the main results
0.2.1 Toric grammars
0.2.2 Markov substitute sets and processes
0.2.3 Model selection: finding a collection of Markov substitute sets
0.2.4 Parameter estimation
0.2.5 Invariant dynamics
1 Toric grammars and communication models
1.1 Introduction to a new communication model
1.2 First definitions
1.2.1 Toric grammars
1.2.2 A roadmap towards a communication model
1.3 Operations on toric grammars
1.3.1 Non stochastic syntax splitting and merging
1.3.2 Random split and merge processes
1.3.3 Splitting rules and label identification
1.4 Parsing and generalization
1.5 Expectation of a random toric grammar
1.6 Language models
1.6.1 Communication model
1.6.2 Comparison with other models
1.7 A small experiment
1.8 The story so far…
1.A Proofs
1.A.1 Bound on the length of splitting and production processes .
1.A.2 Parsing Relations
1.A.3 Convergence to the expectation of a random toric grammar .
1.B Language produced by a toric grammar
2 Markov substitute Sets
2.1 Presentation of the model
2.1.1 Motivation
2.1.2 What is a Markov substitute set
2.1.3 Weak Markov substitute sets
2.1.4 Basic properties of Markov substitute sets
2.1.5 Interpretation in terms of random parsing
2.2 Invariant dynamics
2.2.1 Metropolis invariant dynamics on sentences
2.2.2 Reflecting a reversible dynamics on the boundary of a finite domain
2.2.3 Compound dynamics
2.2.4 A simple example of recursive structure
2.2.5 Crossing-over reversible dynamics on texts
2.3 Exponential families of Markov substitute processes
2.4 Testing Markov substitute sets
2.4.1 Alternative construction of a test function
2.5 Weakening the Markov substitute assumption
2.6 Testing using a parse process
2.6.1 Probability of false rejection
2.6.2 Probability of false acceptance of the hypothesis
2.7 Testing Markov substitute sets without parsing
2.7.1 Definition of the test and probability of false rejection .
2.7.2 Testing for the weak Markov substitute property
2.7.3 Computation of the test
2.7.4 Some numerical examples
2.7.5 Probability of false acceptance of the test
2.8 Estimation of the substitute measure
3 Markov substitute sets and language
3.1 Production rules and Markov substitute sets
3.1.1 Markov grammars
3.1.2 Parse trees
3.1.3 General introduction to parsing
3.2 Substitute measures and reversible dynamics
3.2.1 Parametrization of substitute measures
3.2.2 Estimating substitute measures
3.2.3 Simulating substitute measures
3.2.4 Reversible dynamics for the language distribution
3.2.5 Crossing-over dynamics
3.3 Crossing-over dynamics and the maximum likelihood estimator .
3.4 Building a Markov ruleset
3.4.1 Adding new rules
3.4.2 Reducing the context space
3.4.3 Saturation
3.4.4 Identification of labels
3.5 Toric grammars
3.5.1 Split and merge processes using parsing
3.5.2 Reversible split and merge process
3.6 Estimating the language distribution
3.A Support of the split and merge process and substitutions
3.B Parsing using a ruleset
3.B.1 Parsing general syntagms
3.B.2 Parsing inside syntagms of certain type
3.C Building crossing-over tree kernels
3.D Building general Metropolis reversible dynamics
4 Conclusion
4.1 Possible uses of Markov substitute sets
4.2 Further considerations on the scope of the model
Bibliography