MPPP description of gene expression – Project topics materials

Get Complete Project Material File(s) Now! »

Central Dogma: fifty years of molecular biology

The central dogma of molecular biology, that was first stated by Crick in 1958 [11], was then re-stated by the author in 1970 as follows:
The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be trans-ferred from protein to either protein or nucleic acid.
The two main concepts that were produced in the late 50s were those of sequential infor-mation and of defined alphabets. At the time was already known that proteins have a specific three-dimensional configuration, which aﬀects the activity of the protein itself. The researchers decoupled the problem supposing that the amino acid chain was able to fold itself up, reducing the problem to a one dimensional one and allowing to focus just on the assembly of the polypep-tide chain. It was well-established at that time that the alphabet of the proteins is composed by twenty amino acids, but it was unknown the mechanisms that lead to their encoding.
At that time it was well known that DNA, RNA and proteins play a leading role in gene expression and the central dogma is a possible solution to the problem consisting in the formu-lation of general rules for the information transfer from a polymer with a defined alphabet to another one.
Crick represents the flow of detailed sequence information from one chain to the other using arrows, in a schema as in figure 1.1a, where all possible transfers are plotted. The transfers could be divided in three group following the general opinion in the late fifties: those for which that seemed to exist because of direct or indirect evidence, those which have no experimental evidence nor a strong theoretical need and those which were unlikely to exist. Crick carries out a progressive simplification of this scheme excluding first the processes in the last class and validating those more likely to happen.
Using the classification made by Crick in 1970 [12], we can draw the schema shown in figure 1.1b. Here the solid arrows represent the “general transfers” (first class), while the dotted arrows are the “special transfers” (second class). The absent arrows are the undetected transfers.
The central dogma has to be read as a negative statement saying that there are no information transfers from protein, stressing out which are the most likely transfers (solid lines) and which are the probable ones (dotted lines). Nevertheless the central dogma does not say anything about the machinery involved and the control mechanisms. It was an attempt to give theoretical insights on the main principles which lead to the expression of a gene, using the partial information available at the time and represents the very foundations of molecular biology.
Experiments have confirmed the correctness of the main principles stated by Crick and new technologies have considerably increased the knowledge on the subject and have given a detailed description of the underlying biochemical reactions. This descriptive approach seems to have no end: finer mechanisms pop up when more accurate techniques are available and take their place in the already complex scenario of gene expression.
Despite extensive researches in the field and the many knowledge acquired, little is known about fundamental mechanisms and strategies underlying protein production, because of the extreme complexity of the whole process and the stochastic nature of the elementary biochemical reactions. For all these reasons, mathematical models represent a tool of investigation, in order to isolate mechanisms and check hypothesis based on the acquired knowledge.

Gene expression: main biological mechanisms

The present section is devoted to a general short introduction of the main steps of gene expression and of the main biological mechanisms which intervene in such complex process. This is not intended to be exhaustive, but to introduce the basic terminology which will be used in the following chapters. Specific biological mechanisms will be introduced through the manuscript when needed. Despite the Central Dogma gives the fundamental principles of information transfer in gene expression in any cell type, the description of the process via mathematical modeling should take into account the specificity of the cell types. In particular, models need to distinguish between prokaryotic and eukaryotic cells, at least for specific mechanisms and for their diﬀerent geometric organization. This PhD work focuses on prokaryotes and the subsequent modelisation is therefore aﬀected. However, we will make clear when a modeling choice is strictly connected with prokaryotes; all other choices must be understood as common to both cell types.

Gene activation

Gene activation is the process which allows a gene to be expressed at a specific time. The way this activation may occur varies a lot from gene to gene and from organism to organism. The main mechanisms causing gene activation are the dissociation of a repressor and the association of an activator.
A repressor is a DNA-binding protein that regulates the expression of a specific gene by binding the operator, which is a segment of DNA that a regulator binds to. The binding of the repressor blocks the attachment of RNA polymerase to the promoter and prevents the transcrip-tion of the genes. If an inducer molecule is present, it can interact with the repressor and inhibit its action by detaching it or preventing its binding to the operator.
An activator is a DNA-binding protein that regulates one or more genes by increasing the transcription rate. RNA polymerase binds to the promoter region of the gene, forming a complex which sometimes proceed to gene transcription. An activator recruits the RNA polymerase to its promoter region.
If the two previous mechanisms are shared between prokaryotic and eukaryotic cells, chromatin remodeling is specific to eukaryotes. Chromatin is the complex of DNA and histone proteins with which it associates. Hi-stones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into struc-tural units called nucleosomes. Chromatin on one side serves as a way to condense DNA within the cellular nucleus and, on the other side, as a control of gene expression. Raser and O’Shea [62] hypothesize that chro-matin remodeling is the key regulation mechanism for certain eukaryotic promoters.
Gene activation is a complex process resulting from diﬀerent mechanisms and it is gene and organism specific. Despite genes may show diﬀerent states, in first approximation it can be described as a two-states pro-cess, i.e. the gene may show only two possible states, active or inactive.
The number of copies of a gene within bacteria is a fundamental factor and should be considered in a model describing the expression of a specific gene. When bac-teria are growing they duplicate their DNA, that leads to a number of at least two copies per gene, since the genetic information has to be split between daughter cells.
Remark. Bacteria are often obliged to have more than two copies of DNA, since the duration of replication (∼ 40 minutes) is sometimes longer than cell cycle time, which is ∼ 20 minutes in Escherichia coli in fast growth conditions. For this reason, in “normal” growth conditions we observe DNA regions with one, two or four copies of genes, while in “regeneration” regime, where the cell division cycle takes about 20 minutes, we have up to eight copies of genes localized closer to the origin of replication.

Transcription

The transcription process can be described through the following fundamental steps:
1. initiation: the polymerase binds to one of the specificity factors σ to form a “holoenzyme” in order to attach to a specific promoter in the DNA. The more similar is a sequence to a “consensus sequence” the stronger is the binding to the DNA. After the first bond has been synthesized, the RNA polymerase must clear the promoter (this phase is called promoter clearance). During this time it may occur that a truncated transcript, called abortive initiation, is released;
2. elongation: after the promoter clearance, the polymerase assembles in a controlled fashion the mRNA chain;
3. termination: the ρ-independent transcription termination or the ρ-dependent transcrip-tion termination. The first involves terminator sequences within the RNA that signals the RNA polymerase to stop. The latter uses the ρ terminator factor to stop RNA synthesis.
The transcription regulation controls the frequency and the number of produced messengers. The gene transcription is subject to many control mechanisms and we just recall the most common. The specificity factors alter the specificity of RNA polymerase for a given promoter or set of promoters, making it more or less likely to bind to them, i.e. sigma factors in prokaryotic transcription. Other regulations are made at gene level and have been enumerated in the previous section. In post-transcriptional phase the regulatory machine controls the number of mRNAs that are translated into proteins. The stability and distribution of the diﬀerent transcripts is regulated (post-transcriptional regulation) by means of RNA binding protein (RBP) that controls the various steps and rates of the transcripts.
Prokaryotic and eukaryotic transcription shows peculiar characteristics. In fact, since there is no precise spatial organization in prokaryotes, translation step can start when the polymerase is still building the messenger. This is not possible in eukaryotes since transcription occurs in the nucleus and, therefore, the messenger needs first to be exported out of the nucleus in order that the translation can take place.

Translation

Schematically prokaryotic translation consists of the following steps (see Figure 1.4 for schematic representation):
1. initiation: which involves the assemblage of components such as ribosomal subunits (50S and 30S), mRNA, the first aminoacyl tRNA, GTP (energy) and initiation factors (IF1, IF2, IF3). The tRNA (transfer RNA) serves as the physical link between the nucleotide sequence of mRNA and the amino acid sequence of proteins. In particular, the aminoacyl tRNA (or charged tRNA) carries an amino acid to the ribosome as directed by the three-nucleotide sequence (codon) read by the ribosome. The ribosome has three sites: A, P and E sites. The A site is the entry-point for aminoacyl tRNA, except for the first that binds directly on the P site. In the P site the peptidyl tRNA is formed, i.e. a tRNA bound to the peptide being synthesized, and in the E site the uncharged tRNA detaches from the ribosome;
2. elongation: it is a controlled process in which the polypeptide chain is elongated with the addition of amino acids to the carboxyl end of the growing end. Elongation involves several elongation factors, a conformal change, bond formations, etc. The aminoacyl tRNA attaches in the A site, then moves to the P site where the polypeptide is attached to the growing chain and the uncharged tRNA is moved to the E site where exits from the complex;
3. termination: occurs when one of three terminating codons moves to the A site. These codons are not recognized by any tRNA but by the so called release factors. These factors trigger hydrolysis of the ester bond and release the newly produced protein in the cytoplasm. The ribosome recycling step is responsible of ribosome disassembly in such a way to be ready to start translation of other messengers.
Translation is carried out by more than one ribosome simultaneously. Because of relative large size of ribosomes, they can only attach sites on mRNA at least 35 nucleotides apart. The so called polysome is the complex of one mRNA and a number of ribosomes attached to it.
The translation of mRNA can also be controlled by a number of mechanisms, mostly at the level of initiation. Recruitment of the small ribosomal subunit can be modulated by mRNA secondary structure, anti-sense RNA binding or protein binding. In both prokaryotes and eu-karyotes there is a large number of RNA binding proteins, which are often directed to their target sequence by the secondary structure of the transcript. This structure may change depending on certain conditions, such as temperature or the presence of a ligand. Moreover, some transcripts act as ribozymes and self-regulate their expression.

mRNA degradation

The process of messenger degradation is an essential function for recycling nucleotides and for regulating the level of gene expression and is performed by RNase. The decay process occurs on short time scales, i.e. the typical half-life of a messenger is of about two minutes at 37◦C in most cases. This rapid decay process serves to permit to continuously adjust the number of specific messengers to the population needs depending on the specific environmental conditions.
The decay process consists of two main steps [14]:
1. initiation: primarily due to endonucleolytic attack mediated by the RNase E enzyme in E. Coli [41].
2. break-down: following the initial endonucleolytic cleavage, which is thought to inactivate the message for translation. Additional cleavages take place and result in breakdown of the mRNA into fragments.
Experiments have shown that prokaryotic mRNAs are more unstable and have shorter lives than in eukaryotes. This is probably connected to the absence of a physical separation be-tween the sites of RNA synthesis and RNA function; decay is possibly the major form of post-transcriptional control in these organisms. The stability of the mRNA is connected also to the competition of RNases and ribosomes to bind to messengers [58], i.e. genes with a weak aﬃn-ity of ribosomes and mRNAs show higher levels of mRNA degradation, since ribosome binding protects the messenger from decay [3, 9, 74].
Protein decay: proteolysis and volume dilution
Two main mechanisms of profoundly diﬀerent nature are responsible of the decay of proteins:
proteolysis and volume dilution.
The first mechanism is analogous of the degradation mechanism of messengers, but it is now mainly used to destroy possibly dangerous proteins, such as misfolded proteins, since protein’s structure determines not only its specific cellular function, but also its intracellular stability. The degradation machinery diﬀers between eukaryotes and prokaryotes, as shown in the review article of Goldberg [21]. Prokaryotes, in particular, have developed an elaborate proteolytic machinery to quickly destroy misfolded proteins. If protease is the enzyme that conducts proteolysis, nev-ertheless, the machinery is much more complex, since if proteases were free to act in the cytosol, “they would quickly convert the cell into a bag of amino-acids” [21]. In any case, the proteolysis appears as a control mechanism to prevent the release/survival of malfunctioning proteins or to remove damaged proteins. Actually, proteins are continuously subjected to stress, such as tem-perature, that eventually causes the protein denaturation. The denatured protein needs to be removed since its functioning has been compromised. This aging phenomenon of proteins occurs on long timescales: the average protein lifetime is usually bigger than the protein cell cycle.
The second mechanism, protein dilution, is of completely diﬀerent nature. Both prokaryotic and eukaryotic cells double their internal components in order to give rise to two daughter cells. The volume growth associated to the doubling aﬀects the concentration of each cellular component because of volume dilution. Intuitively, if we stop the production of proteins at some point, their concentration will drop down as a consequence of the increase of cell volume. This mechanism is therefore very diﬀerent from the biochemical interactions which lead to proteolysis. Dilution is strictly connected with the cell growth rate and it turns out to be continuous and deterministic, since growth rate is fixed, as long as environmental conditions are kept unchanged.
In normal conditions and for stable proteins, dilution is the leading degradation mechanism.

Stochasticity in gene expression: experiments

Randomness and determinism are constantly present in the development, growth and life of cells: random biochemical reactions have to be reconciled to the precise development of organisms. The biological implications of the stochastic fluctuations in gene expression has boosted researches in the field, that have multiplied both theoretical and experimental works.
If the stochastic fluctuations were often taken apart by considering statistics on large numbers and reducing the analysis to deterministic models, experimental scientists have become more and more aware of the inherent stochastic nature of the gene expression. Researchers have found that variability among cells in a genetically identical population is strongly connected with fluctuations in the expression of single genes. Stochasticity in the protein production is often just considered as a danger for the normal development of organisms. Nevertheless, some living organisms may exploit the stochastic fluctuations in the expression of genes to introduce phenotypic diversity in genetically identical cells. This variability can be advantageous in specific cases, like face to drastic variations in environment or stress conditions, but it can also be very dangerous when it turns out to be an obstacle to the realization of the cell program.
We focus on experimental results concerning stochasticity in the gene expression, from experi-mental evidences of the stochastic nature of the phenomenon to negative or positive consequences of fluctuations. Few models are considered here and we refer to Sections 1.3 and 2.A for a detailed description.
In the late 50s Novick and Weiner [50] showed that the production beta-galactosidase (β-galactosidase or β-gal ) was variable and random in individual cells, but those studies were hindered by the lack of reliable measures and were not considered conclusive to prove stochasticity in gene expression. One of the first studies which use an expression reporter in single cells was the work of Ko et al.[40] in early 90s. In this work researchers have examined the eﬀect of diﬀerent doses of glucocorticoid on the expression of the transgene encoding β-gal and have found a large cell-to-cell variability by directly measuring the amount of protein in diﬀerent cells. Moreover, as in Novick’s work, increasing the dose does not increase uniformly the expression in every cell, but it increases the frequency of cells displaying high level of expression. The dose dependence has been interpreted by authors as a change in the probability that an individual cell would express the gene at high level, concluding that the gene expression is a stochastic process.
In 2002, Ozbudak et al.[51] studied the fluctuations of gene expressing green fluorescent protein (GFP) driven by an inducible promoter in Bacillus Subtilis. The authors tune the rate of transcription by varying the level of induction of the promoter. Translation rate was modulated by introducing mutations in the ribosome binding site (RBS). It results that the transcription and translation rates aﬀect the protein fluctuations and the results were interpreted using the theoretical model proposed by Thattai and van Oudernaarden [71]. This model predicts that the protein relative variance depends on the transcription rate, but it remains unchanged because of variations in the rate of translation.
Elowitz et al.[16] introduced the dual-reporter technique to measure the stochastic fluctuations of proteins in Escherichia Coli. This technique allows to express two diﬀerent fluorescent proteins, the CFP and YFP, from identical promoters. Since the two proteins share the same regulatory control, the diﬀerences between their expression can be attributed to the “intrinsic » stochasticity of the gene expression process, because of the random microscopic events which govern each reaction. On the other side, the “extrinsic » noise, which derives from cellular heterogeneity, such as regulatory proteins, ribosomes and polymerases, or stochastic events in upstream signal transduction, will aﬀect both proteins. We refer to Section 1.3 for a deeper analysis on these concepts. Authors showed that extrinsic noise represents a non-negligible portion of the overall fluctuations and stressed the necessity to take into account both sources of noise when controlling or minimizing the fluctuations of a system.
Jonathan M. Raser and Erin K. O’Shea [62] used the same technique to study gene expression in yeast. The authors analysed three diﬀerent promoters in the budding yeast Saccharomyces cervisiae: the PHO5, PHO84 and GAL1 promoters. The total noise on the three promoters was found to be dominated by the contribution of the external factors, such as cell shape and size, cell cycle stage or gene-specific signaling. The authors reduced these possible factors of hetero-geneity by using experimental techniques, like flow cytometry, used to isolate sub-populations with homogeneous sizes. The extrinsic noise resulted to be diminished, but non dramatically. Moreover the extrinsic noise was found to be not promoter-specific, since it resulted correlated when the two fluorescent proteins were associated with promoters that are distinctly regulated. This leads to hypothesize that the extrinsic noise will cause proteins to be maintained in constant relative concentrations. In order to analyze the noise in eukaryotes, the authors use a three-stage model similar to the model presented by Paulsson [54], see Section 2.A.2 for details. The authors claim the applicability of the three-stage model to both prokaryotes and eukaryotes, the main diﬀerence being the specific mechanisms of gene regulations. Relative diﬀerences in the parame-ters can lead to diﬀerent scenarios which can be biologically interpreted. It can be easily showed that two promoters can produce the same average number of mRNAs with diﬀerent fluctuation characteristics in this number: a promoter that undergoes frequent activation processes followed by ineﬃcient transcription will show smaller variability with respect to a promoter which has rare activation processes followed by stable active state. The authors find three characteristic regimes of gene regulation, defined in terms of the rates of the three-stage model and which result in diﬀerent noise profiles. In this paper, extrinsic noise seems to be predominant with respect to intrinsic and seems to be of global nature, i.e. it aﬀects the expression of any gene.
A global analysis of the production of proteins in Saccharomyces cerevisiae was conducted independently by Bar et al.[2] and by Newman et al.[48] in 2006. Newman and collaborators [48] studied fluctuations in more than 2500 proteins, using the pairing of high-throughput flow cytometry and a library of GFP-tagged yeast strains to monitor protein levels at single cell reso-lution. This new strategy for large-scale protein abundance measurements allowed the scientists to deduce that abundance is the major factor governing protein variation, which most likely originates from the stochastic production and destruction of mRNAs. Bar and collaborators [2] studied 43 diﬀerent proteins under 11 experimental conditions, founding that the variance is roughly proportional to the mean, as predicted by models of stochastic gene expression. Highly expressed genes seem diﬀerentiate from this trend since their variance appears uncorrelated with respect to abundance, as showed also in [48]. The researches point to low-copy mRNA fluctu-ations and gene regulation as the main responsible for protein fluctuations, which is consistent with the scaling property observed. Moreover, using a dual-reporter diploid strain in similar fashion than Elowitz [16], they show that intrinsic noise contributes substantially to the overall protein noise in the case of proteins with intermediate expression level, while it is much smaller than extrinsic fluctuations for highly produced proteins. Both works [48] and [2] stress how pro-teasome genes are characterized by low noise levels, while stress proteins are very noisy, which indicates a precise structure in protein-specific variation and suggests that noise levels have been selected to reflect costs and potential benefits. The works of Yu et al.[75] and Cai et al.[8] have instead focused on the development of techniques allowing real time observations with single cell sensitivity, in order to analyze gene expressed at low levels. β-galactosidase is the protein studied in both works, since this is the standard reporter for gene expression both in prokaryotes and eukaryotes. A single molecule β-gal can produce a large number of fluorescent product molecules by hydrolysing a synthetic fluorogenic substrate, which makes β-gal an high-sensitivity cellular reporter. However, the drawback of its use is the fast diﬀusion of the fluorescent products which are quickly dispersed. Cai and collaborators [8] propose to trap the cells into a microfluidic device: cells are trapped into closed microfluidic chambers, such that the fluorescent products expelled from the cells can accumulate in the small volume of each chamber. Yu and collaborators [75] suggest another technique: they designed a fusion protein consisting of YFP (yellow fluorescent protein) and a membrane protein (tsr ), slowing down the dispersion of fluorescent material and allowing to take measures. Both works were performed on Escherichia coli cells with a target polypeptide ex-pressed under repressed conditions. Thanks to the use of single-molecule fluorescence microscopy on mRNA [61, 22, 42] and on proteins [75, 8], Taniguchi et al.[70] have performed a quantitative system-wide analysis of mRNA and protein expression in individual cells in Eschierichia coli, see Figure 1.5. The authors, after normalization to account for cell size and gene copy number variation due to cell cycle, have measured protein abundances ranging between 10−1 and 104 copies per cell. They found that while the noise scales with protein abundance for low expressed proteins < 10, as in [2, 48], this is not the case for proteins produced in higher quantities, where noise reaches a plateau suggesting that each protein has at least 30% of variation. They made striking real-time measurements of mRNAs, using FISH technique, and proteins at same time on 137 strains for high expressed proteins, analyzing both mRNA production and mRNA-protein correlation.
Noise and, in particular, intrinsic noise is an obstacle to the genetic program since the stochas-ticity of biochemical reactions leads to uncertainty in the resulting amount of proteins which could be deleterious to the achievement of the cell program. On the other hand, in specific cases these fluctuations positively exploited by cells, as a source of heterogeneity or as fundamental tool of decision making.
Starting with positive eﬀects, fluctuations in gene expression are pointed as a major mech-anism to obtain diﬀerent phenotypes in an identical population. This diﬀerentiation can lead to the spring of sub-populations which are committed to diﬀerent responses to environmental changes. Cell variability can be boosted in the presence of networks that can produce mutually exclusive profiles such as ON and OFF expression of a gene: small variations in the gene ex-pression can not cause the switch from one state to the other, but rare and large fluctuations can lead to a transition. This is the case, for example, of the lysis – lysogeny decision in lambda phage-infected E. Coli [43, 29] or of the lac operon in E. Coli [52, 45] or the galactose utilization network in yeast [33]. In particular, for the lysis – lysogeny decision the stochastic eﬀects in the expression of some regulatory factors could explain the “decision » of cells to take the lysic or lysogenic pathway. The reason to choose for a stochastic based decisional network can be connected with the performances of the resulting strategy. For example, in the presence of food cells can adopt two diﬀerent strategies: they can sense food in the environment and then activate the metabolic machinery or they can stochastically decide to activate the metabolic networks in some sub-populations in anticipation of possible food arrival. The first strategy is more eﬀective but it can be slow, while the second sacrifices few resources for a quick response. Researchers have shown how the stochastic switching strategy could be a good alternative to the sensing machinery in the cases in which stochastic fluctuations were more or less synchronized with the environment fluctuations [1]. Cellular stress, such as lack of food or exposure to antibiotics, is another case where stochastic decision could explain the observed behavior in bacterial pop-ulations, as shown in the case of competence in Bacillus subtilis [42, 68]. In particular, it is shown how the reduction of fluctuations results in lower percentage of competent cells, reducing the chances of the survival of the population under stress conditions. Although the utilization of noise for specific mechanisms, noise in gene expression has to be thought as deleterious for organisms and for gene expression in particular, which reveals a robustness with respect to fluc-tuations. The genome-wide works of Newman et al.[48] and Bar et al.[2] point how the variability is gene specific, in particular stress-genes, which are non essential for cell functioning, show high fluctuations, while proteasome genes are much less variables. This allocation of noise indicates that diﬀerent production strategies have been selected and are possibly the result of the tradeoﬀ between low level of noise in the protein production and the cost in term of resources of producing a large number of proteins at any time.

READ Robust Subgaussian Estimation of a Mean Vector in Nearly Linear Time

Intrinsic and extrinsic noise

Biological systems are constituted by individuals interacting in changing environments. In par-ticular, fluctuations in gene expression are due to the probabilistic nature of the underlying biochemical reactions (“intrinsic noise”) as well as to the eﬀect of environment on this production (“extrinsic noise”). The measured fluctuations are therefore the result of the combined eﬀect of these two sources of randomness, which lead to a system hardly treatable. Decomposing noise into separate terms, even if it does not provide information on the latent mechanisms, it al-lows to evaluate models without the obligation to specify simultaneously extrinsic or intrinsic mechanisms. The concepts of intrinsic and extrinsic noise were introduced by Michael B. Elowitz et al.[16] and Swain et al.[69] in 2002. The stochasticity inherent in biochemical processes underlying gene expression, such as transcription and translation, is referred to as intrinsic noise, while the fluctuation in local environment or in the states of any other cellular factor that aﬀects gene expression results in extrinsic noise. We make these definitions more clear by describing the simple and ingenuous experimental approach designed by Elowitz et al.[16] to perform this separation. The researchers used two equivalent independent gene reporters placed in the same cell and observed the two copies simultaneously. Correlations between the outcomes of the two reporters reflect the influence of the common environment, extrinsic see Figure 1.6a, while diﬀerences in their expressions are the consequences of the random microscopic events governing each reaction, intrinsic see Figure 1.6b.
If X denotes the number of proteins of interest in a given cell, we can always write the cell-to- cell variability σX2 by conditioning the data on the state Z of the extrinsic variables, i.e. number of polymerases, ribosomes, . . . Therefore, the cell-to-cell variability can be decomposed as σ2 = σ2 + σ2 , (1.3.1) X X|Z X|Z unexplained by Z explained by Z where we used the notation of [27] and where we used the law of total variance. In particular, σX2|Z is the variance of the random variable X in the subpopulation characterized with extrinsic variables Z and the angular brackets denote the averages over all such subpopulations. The term σ2X |Z is the variance of the conditional expectation of X given Z. The decomposition (1.3.1) is equivalent to the decomposition in the original theoretical paper of Swain et al.[69]. However, conditioning on the state of the environment captures the correct contributions only under the case of slow environmental fluctuations, but it is not well suited in the case of dynamic environment.
The main issue of decomposition (1.3.1) is that it looks at the environment at a precise point in time, but the whole history matters and, to keep track of it, Hilfinger et al.[27] propose a new decomposition
σ2 = σ2 + σ2 , (1.3.2)
X Xt|Z[0,t] Xt|Z[0,t]
σINT2 σEXT2
where the first term in the right hand side is the variance of Xt in a subpopulation sharing an environmental history Z[0, t] averaged over all possible histories and the second term is the variance of the conditional expectation of Xt given a history Z[0, t], where t = 0 corresponds to the infinite past. In ergodic systems, the term σext2 can be interpreted as the time variation of the average, while σint2 results the fluctuations around the average. By applying the decomposition (1.3.2) to the two-promoter reporter, see Figure 1.6, we obtain that Cov(X, Y ) ≡ σext2, where X and Y are the numbers of the two identical and independent reporters in the same cell. This brings back the original ideas of Elowitz et al.in [16], where they interpreted the extrinsic contribution as the correlation between the two reporters, while the intrinsic noise is seen as uncorrelated fluctuations of the reporters under investigation.

Table of contents :

Introduction
1.1 Gene expression
1.1.1 Central Dogma: fifty years of molecular biology
1.1.2 Gene expression: main biological mechanisms
1.1.3 Translation
1.2 Stochasticity: experiments
1.3 Intrinsic and extrinsic noise
1.4 Stochasticity: models
1.4.1 Limits of classic models
2 MPPP description of gene expression
2.1 Biology and mathematical assumptions
2.1.1 Biological context
2.1.2 Mathematical model of gene expression: three-stage model
2.1.3 Limits of classic models: the exponential assumption
2.2 MPPP Description of Gene Expression
2.3 General results
2.3.1 Gene state
2.3.2 Messengers
2.3.3 Proteins
2.4 Results: explicit formulas and numerical analysis
2.A Appendix: classic models
2.A.1 The Rigney’s model
2.A.2 Paulsson’s model survey
2.A.3 Swain’s model
3 Realistic model of gene expression
3.1 Four-Stage Model
3.1.1 Model and general results
3.1.2 Realistic assumptions
3.1.3 Explicit formulas under realistic assumptions
3.2 Qualitative and quantitative analysis
3.2.1 Biological data and model parameters
3.2.2 Estimation of fluctuations: deterministic elongation
3.2.3 Four-Stage Model: a counter-intuitive result
3.2.4 Proteolysis vs. dilution
3.2.5 Impact of different steps on protein fluctuations
3.A Reference parameters
4 Multi-protein model
4.1 Stochastic model
4.2 Asymptotic Behavior
4.3 Analysis of fixed point equation
4.3.1 The underloaded case
4.3.2 The Case of Overloaded mRNAs
A Mathematical tools
A.0.3 Marked Poisson Point Processes
B Biology
B.1 Biological Mechanisms
B.1.1 Gene activation
B.1.2 Transcription
B.1.3 Translation
B.1.4 mRNA degradation
B.1.5 Protein degradation
B.2 Biological glossary
B.2.1 16S ribosomal RNA
B.2.2 -galactosidase
B.2.3 DNA
B.2.4 Gene
B.2.5 Inducer
B.2.6 Operon
B.2.7 Promoter
B.2.8 Ribosomal Binding Site (RBS)
B.2.9 Ribosome
B.2.10 RNA
B.2.11 Messenger RNA (mRNA)
B.2.12 Ribosomal RNA (rRNA)
B.2.13 Transfer RNA (tRNA)
B.2.14 Shine-Dalgarno sequence