Get Complete Project Material File(s) Now! »
Clone sequence data
The CTV diversity, according to clone sequence data, is represented with the six dendrograms in Figure 1. The non-collapsed phylogenetic trees for the cloned sequences for each of these samples are illustrated by Figures 1-6 in Appendix C.
Sample 13-3062 from Hoedspruit appeared homogenous for an AT-1-like strain, with the 34 clones grouping most closely to this reference. Sample 13-3309 from Malelane showed apparently high levels of diversity, with 5 clones falling into a unique branch close to the NZRB references, 1 clone within the Kpg3/SP/T3-like group, while a further 5 clones were found to group in a unique branch close to the VT group. Thirty clones grouped within the AT-1 reference branch with another 2 clones falling within a unique branch close to AT-1. The majority of clones (31) from the 13-3410 sample from Swaziland grouped within the NZRB-TH30 branch, 3 grouped with Taiwan-Pum/SP/T1 (forms part of the RB TH-28 group in the collapsed tree) and 1 clone fell within the T36 branch. The sample from the Northern Cape (13-3534) yielded 14 clones within the Kpg3/SP/T3 group, 1 within the HA 16-5 group and 1 clones, forming a unique group between Taiwan- Pum/M/T5 and T30 and finally 5 clones grouping with the RB group. Sample 13- 3642, from Sundays River Valley indicated a mixed infection with 7 clones grouping with Taiwan-Pum/SP/T1 and 30 clones grouping within the NZRB group. Sample 13- 3719 also indicated a mixed infection with Taiwan-Pum/SP/T1 and the remainder (18) of the samples fell within a unique group closest to the T36 branch. These clones could represent a previously undescribed strain or genotype.
Illumina MiSeq sequencing data
The Illumina MiSeq data mapped to reference sequence is represented in Table 3. The RB sequence type was the most highly represented and was present in the dataset of every sample that was analysed at an incidence of between 1.6 and 100% of mapped reads. Reads of the Kpg3/SP/T3 sequence type was the second most numerous and was found to be present in all datasets except for 8 samples from each of Hoedspruit 2 and Northern Cape and 5 samples from Sundays River Valley. Kpg3/SP/T3 reads were represented at between 0.1% and 93% of CTV mapped reads in samples, and never occurred on its own. HA 16-5 was detected at between 0.1 – 45.7% of total CTV mapped reads and was present in all collection sites except Hoedspruit 2. VT was detected in samples from all collection sites except Malelane 1 and represented low percentages of the total mapped reads with the exception of one of the Swaziland samples, where VT was represented by 47.7% of mapped reads. The remainder of the sequence types appeared occasionally within a few samples, throughout many of the collection sites. AT-1, T36, Taiwan-Pum/M/T5 and T30 were present at incidences of 0.1 – 81%, 0.2 – 45.6%, 0.1 – 2.3% and 0.1 – 0.4% of total mapped reads per sample, respectively. Illustrative analyses of these populations are shown in Figures 1-8 in Appendix D, while the data relating to the mapping of reads for each sample is shown in Tables 1.1-8.15 in “Supplementary data” Section A (supplied electronically).
Table 6 compares and serves to determine the agreement that exists between the determination of the dominant sequence type in population that were sequenced directly with Sanger sequencing, as well as Illumina MiSeq sequencing. According to this comparison, 25 out of 52 samples showed agreement between the two sets of data, when the dominant sequence type was represented at levels greater than 90% of the population, according to the Illumina MiSeq data. A total of 21 (40%) populations showed agreement between the direct and Illumina sequencing, when the dominant sequence type, according to the Illumina dataset, was present at less than 90% of the population. A lack of agreement between the two sets of sequence data existed for six of the populations.
The sequencing data of the greenhouse maintained original GFMS 12 sources (Table 4) showed that the levels of various CTV components varied significantly between these three populations. The original Nartia A (GFMS 12) source (14-6000) maintained at the ARC-ITSC was dominant for VT-like reads, at 81%. AT-1 reads were presented at a level of 12.3%, while the remainder of the components, namely RB, A18 and CT14A were represented at levels below 10%. The population that was collected 10 years after the GFMS 12 population was collected, but from the same source (14-6001), also maintained at ARC-ITSC, was dominant for RB-like components at 97.2% of mapped reads. The remainder of the components, namely Kpg3/SP/T3, VT, AT-1 and CT14A were minor components and represented at levels below 2% of mapped reads. The original GFMS 12 source maintained at the CRI facility showed an almost equal representation of VT and AT-1-like components at 45.8% and 48.7% of mapped reads respectively. CT14A was represented as a minor component at ~5% of total mapped reads. The Illumina MiSeq datasets that were generated for the non-pre-immunised Star Ruby trees planted at a trial site in Letsitele are represented in Table 5. A total of seven different sequence types were represented across all three samples. 14-6003 was dominant for VT-like components at 81% of total mapped reads. AT-1 was represented at 16.3% and CT14A, RB and A18 were minor components and were represented at levels below 2%. 14-6004 was dominant for RB at 99.6% of mapped reads with AT-1 being a minor component at 0.4%. All seven components were represented in 14-6005. RB-like components were also dominant in this sample at 81.5%. AT-1 was represented at 11.1% of total mapped reads. The remainder were minor components. A comparison between the numbers of reads mapping to a reference sequence and the consensus length of assembly was done for each Illumina MiSeq dataset. These comparisons are illustrated by Figures 1.1-9.3 in “Supplementary data” Section B (supplied electronically).
DISCUSSION
The identities of 192 CTV populations, collected from the major Star Ruby grapefruit production areas throughout Southern Africa, were characterised using the established techniques of direct sequencing and the sequencing of multiple clones, and the more recently developed NGS technology. This study has produced one of the largest sets of data pertaining to the diversity of CTV strains in Southern Africa, which should prove very useful in the improvement of CTV cross-protection, which should begin to start targeting specific strains, as more knowledge of diversity becomes available. The main technical advantage provided by this study was the establishment a protocol for characterising CTV population based on NGS technologies, which is a direct replacement of the established direct Sanger sequencing and the sequencing of multiple clones.
Amplicons were generated using the primer pair targeting the p33 gene. All 192 amplicons were characterised using direct Sanger sequencing, while a subset of six of these, each representing a particular production area, was characterised by the sequencing of multiple clone inserts. Direct Sanger sequencing of PCR amplicons generated from an RNA virus quasispecies, such as CTV, is expected to provide the identity of the dominant component of the population (Pawlotsky et al, 1998). CTV population are often composed of various disparate strains and direct sequencing data containing consistently ambiguous base calls is usually evidence that a population is made up of variable CTV components (Fontana et al, 2014). In this study, direct sequences not adhering to stringent criteria such as PHRED scores were assumed to be a result of populations consisting of more than one detectible sequence type and only the identities of those appearing to contain a seemingly single, clearly dominant sequence are reported. A total of 35 of the direct sequences out of the total of 192 did not conform with the quality parameters, which translates to ~18%.
As shown in the comparison between the two datasets, a total of 52 samples had direct sequence data that conformed to the quality thresholds as well as corresponding Illumina MiSeq datasets (Table 6). Agreement between the datasets, in terms of the dominant sequence type, was observed for 46 (88%) of the populations however, in 27 of the populations, the dominant sequence type, made up less than 90% of the population, according to the MiSeq data. The two datasets showed a lack of agreement for the remaining six populations in terms of the dominant sequence type.
Chapter 1: Introduction
Chapter 2: Literature Review
(2.1) Biology, pathology and epidemiology of Citrus tristeza virus
(2.2) Control of CTV and the use of cross-protection and in Southern Africa
(2.3) CTV detection, diagnostics and phylogenetics
(2.4) Next Generation Sequencing chemistries and platforms
(2.5) Characterising viral populations using NGS technologies
Chapter 3: PCR bias associated with primers targeting conserved sequences for genotype diversity studies within Citrus tristeza virus populations
-Introduction
-Materials and Methods
-Results
-Discussion
Chapter 4: Diversity of Citrus tristeza virus populations in commercial Star Ruby orchards in Southern Africa, using Illumina MiSeq technology
-Introduction
-Materials and Methods
-Results
-Discussion
Chapter 5: Concluding Remarks
-List of appendices