Get Complete Project Material File(s) Now! »
Model: non-homogeneous Markov processes associated to a linear birth-death model
On a probability space ( ,F, P) let us consider a continuous time birth and death process (It)t0, with constant rates and μ starting from I0 = 1. From the biological point of view this model represents a population where individuals evolve independently from each other and reproduce or die at exponential times with rates and μ respectively.
Then, It represents the number of individuals alive at t 0. As we mentioned in the introduction, we have in mind applications of this model to the field of epidemiology. Hence, the birth and death process represents the transmission process of a pathogen in a sufficiently large susceptible population (no density dependence is considered here).
Consider a fixed time T corresponding to present time. We will now define a nonhomogeneous Markov process on N2, denoted {Yt,Zt}t0, associated to I, that can be defined for 0 t T as follows (for abbreviation the dependence on T is omitted):
• Yt is the number of individuals at t having extant descendants at time T.
• Zt = It − Yt.
According to this definition (Y0,Z0) = (1, 0) conditional on IT > 0. If we denote by p0(t) = P1 (It = 0) , 0 t T, that is the probability of extinction of the population before time t, and by q(t) the probability of surviving (clearly p0(t)+q(t) = 1 80 t T), the process (Y,Z) will have the following transition dynamics, conditional on (Yt,Zt) = (n,m).
Reconstructed phylogenetic tree
The situation depicted in the previous section concerns the dynamics of birth and death processes, but indeed, some of the results we obtain here are established in a more general branching model, a binary homogeneous Crump-Mode-Jagers process (CMJ) [Lam10, Lam11]. A CMJ process describes a population where individuals reproduce independently of each other, have i.i.d. lifetime durations with arbitrary distribution (not necessarily exponential), and give birth at constant rate during their lifetime, giving rise to a single offspring at each birth event. A particular aspect of this model is that, since no assumption is made on the distribution of the lifetime durations of individuals, the process is not necessarily Markovian (unless the lifetime distribution is exponential or a Dirac mass at {1}). The tree associated to the genealogy of a CMJ process is a splitting tree. We refer to the introduction and to [Lam10] for a more complete description of these processes. As pointed out before, our aim is to derive an analytic formula for the likelihood of the reconstructed transmission tree under these dynamics, jointly with the population size process. To be more precise, we are interested in computing the likelihood of the reconstructed tree from N individuals alive at time T which coalesced at times t1, t2, . . . tN−1, derived from a CMJ process that started with a single individual at time 0 (I0 = 1), as in Fig. I.3. It should be noted that we are not interested in the topology of the coalescence process since the likelihood does not depend on the topology of the reconstructed phylogeny, see for instance [Tho75] or [EHS+11, LS13], we will come back to this fact later.
Some results about the coalescent point processes
As we have explained in the introduction, in [Lam10] the author establishes that the genealogy of a splitting tree conditioned to be extant at a fixed time T (IT 6= 0) is given by a coalescent point process, that is, a sequence of i.i.d. random variables Hi, i 1, killed at its first value greater than T. In other words, for 1 i < j IT − 1 the time elapsed into the past until the i-th and j-th individuals alive at time T find their TMRCA, is distributed as the maximum of j −i i.i.d. random variables with same law as H, for instance Hi+1, . . .Hj . In particular, conditional on IT 6= 0, IT follows a geometric distribution with parameter P(H < T).
In everything that follows, we suppose that H has a distribution that is absolutely continuous w.r.t. Lebesgue measure and its probability density function will be denoted by f. The common law of these so-called node depths distributed as H is [Lam10]: P(H > s) = 1 W(s).
Sampled reconstructed tree from the total population at present time
Our goal in this section is to characterize the probability distribution of the total population size at T, conditionally on the coalescence times of sampled individuals, following the sampling model presented before. This is a delicate and complex issue, even in the linear birth-death model, since it requires to integrate over all the possible extinct (unobserved) subtrees between 0 and T, so it is not possible to compute the likelihood directly. Therefore, we will rather look at the probability generating function of IT on the event of observing the TMRCA between sampled individuals at T, to be smaller than the times in vector T#−1 = (t1, . . . , t#−1) and the number of sampled individuals to be # = K. We will exploit the previously stated fact that the genealogy of a splitting tree conditioned to be extant at a fixed time T is given by a coalescent point process.
We remind the labeling order introduced in Subsection 1.3 for the total population of the splitting tree at a given level, say T. Individual labels are denoted by (xi, 1 i IT ) and we set xi = i for i 1. Now, let (exi, 1 i #) be the labels of sampled individuals at T, that is, a subsequence of [1, IT ]. Let us define the function G : [0, 1] ! R+ as follows G(u) = E uIT 1 eH1<t1,…, eH#−1<t#−1 1{#=K IT 6= 0 .
Table of contents :
Introduction
1 Preliminaries
2 Statement of results
I Inferring population dynamics from virus phylogenies
Introduction
1 Preliminaries
2 Likelihood computation
Appendix
I.A Remaining proofs
I.B Some useful formulas and calculations
II Time reversal dualities for some random forests
1 Introduction
2 Preliminaries
3 Results
4 Epidemiology
Appendix
IIIBranching processes seen from their extinction time
1 Introduction
2 Preliminaries
3 Main results
4 Applications
5 Remaining proofs
Bibliography