Get Complete Project Material File(s) Now! »
RARE VARIANT ASSOCIATION TESTING
Most human variants are rare and have greater independence compared to common variants leads to greater multiple testing penalty for individual variant association tests. Further, many more samples are needed than for common variants to even observe the rare variants and identify their effects. Exome-sequencing studies are generally underpowered in uncovering deleterious mutations with a large effect for complex traits. Assessing the contribution of a single variant to disease risk involves the relatively straightforward comparison of the frequencies of alleles or genotypes at the genomic site in cases compared to controls. However, classical association tests (eg. chi-squared, Cochran-Armitage trend, Fisher’s exact, and Wald tests) are not appropriate for rare variation due to their low frequencies and greater numbers than common variants, which can lead to spurious findings when observed counts are too small. Single variant tests can be useful for rare variants when only summary data are available or if the effects are very large, the sample size is sufficient, or variants are not too rare, and to assess sequencing data quality and identify population stratification. Fisher’s exact test compares allele frequencies between cases and controls and is considered analogous to the allelic chi-squared test but is appropriate for all, especially small, samples sizes. However, this test can be overly conservative.
Multiple methods exist to aggregate rare variants and test the cumulative effect by gene or genomic region for disease association, but performance is largely dependent on the disease architecture (Table 1). As this is often unknown, it can be useful to apply a combined test or multiple methods and adjust p-values across all tests/methods. The most common method to group rare variants Dina Zielinski 40 together in population-based genetic analyses is at the gene level, usually via a collapsing test, combined multivariate and collapsing test, or sequence kernel association test (SKAT). Each method makes varying assumptions of the underlying genetic model and can be powerful in different scenarios, particularly with respect to the presence of benign variants or different directions of effect of causal variants.
COMPLEX ETIOLOGY OF NEURODEGENERATIVE DISEASE
Aging is a complex trait and the greatest risk factor for neurodegenerative disease (ND), with environmental and genetic risk factors contributing to physical deterioration, increasing risk of disease and death. Human life expectancy has increased by a quarter of a year each year for the past nearly 200 years and continues to increase (Oeppen and Vaupel, 2002). There are few effective treatments for progressive, irreversible neurodegenerative disease and the prevalence and associated burden of age-related disease will continue to increase as the population over the age of 65 grows. The most common neurodegenerative diseases, Alzheimer’s and Parkinson’s, appear predominantly in the elderly and risk increases with age. Parkinson’s disease is the second most common neurodegenerative disorder and was first described more than 200 years ago (Goetz, 2011), yet no intervention exists to slow or stop the associated nigrostriatal degeneration. The global prevalence of individuals with PD has more than doubled globally in the past 20 years (GBD 2016 Parkinson’s Disease Collaborators, 2018) and is expected to double within the next 20 years to more than 12 million cases (Dorsey and Bloem, 2018). One in ten individuals aged 65 years or older is diagnosed with Alzheimer’s disease and Parkinson’s disease has a global prevalence of 0.5% and 20% in individuals 65 or older (Hou et al., 2019). Heritable forms of PD represent 5-10% of cases and the penetrance of familial mutations varies widely and is age dependent.
GENE EXPRESSION PROFILING IN NEURODEGENERATIVE DISEASE
Despite progress in uncovering genetic risk factors for Parkinson’s disease, the molecular mechanisms underlying cellular degeneration in the brains of individuals with PD remain poorly understood. Transcriptomics is increasingly utilized to interpret the functional impact of genetic associations (Cummings et al., 2017; Liu et al., 2014). This approach can be effective for splice variants, but additional methods are needed for disease-associated missense variants, which can often lead to perturbed protein-protein interactions rather than perturbed protein structure/function. Gene expression studies in neurodegenerative disease are limited by small sample sizes, as these diseases are more common in aged individuals and disease-free brains are relatively rare. Further, results reflect perturbed expression after disease onset rather than changes that lead to PD. Despite many inherent challenges, transcriptomic analyses in PD, and other neurodegenerative diseases, have revealed genes involved in various pathways, including lysosomal and mitochondrial dysfunction, oxidative stress, neuroinflammation, and immune function.
JOINT VARIANT CALLING WITH HEALTHY, AGED CONTROLS
A major challenge in genetic studies of diseases of aging is finding appropriate control subjects. Although the probability of a healthy control sample being affected by neurodegenerative disease is incredibly small, rare variant studies are already under-powered. We obtained individual data from 2570 healthy controls of European ancestry over the age of 70 with no history of major disease from the recently established Medical Genome Reference Bank (Pinese et al., 2020). To reduce the false positive rate due to different sequencing approaches, genotypes were jointly called with our cases, internal HapMap controls, and MGRB samples (Methods). Joint analysis incorporates information across all samples, improving error detection and can increase sensitivity in low-coverage regions as genotype likelihoods across all samples can be used to inform genotype inference.
All cases and controls were reported to be of European ancestry, however, while it is challenging to adjust for population structure given our small (~1Mb) target region, we applied principal component analysis to the publicly available sequencing data for 1397 HapMap samples (1000 Genomes Project Consortium et al., 2015) and our cohort (cases, HapMap CEU, and MGRB controls). PCA performance depends highly on population structure and the underlying risk distribution and its utility in correcting for population stratification for rare-variant association tests is unclear. In such cases, PCA can be used to guide the matching of cases and controls. Less than 1000 SNPs remained after filtering rare variants (MAF<1%) and LD pruning, however all samples in our cohort clustered closely with the CEU (Utah residents with Northern and Western European Ancestry) and TSI (Toscani in Italia) HapMap populations and with the MGRB control samples. Principal components analysis based on WGS of the MGRB samples revealed 53 samples with potential non-European ancestry. Given the sensitivity of rare variant analysis to population structure, these samples were excluded from the analysis.
Table of contents :
I. Parcours professionnel
1. Curriculum vitae
2. Retour reflexif
3. Rapport d’activités de recherche
2011 – 2014 : Whitehead Institute for Biomedical Research/MIT
2015 – 2017 : New York Genome Center/Columbia University
2017 – 2019 : Institut Curie
2019 – Présent : Paris Transplant Group
5. Annexes
II. Rare variant analysis in human neurodegenerative disorders guided by cellular models of proteotoxicity
Résumé
Abstract
1. Introduction
1.1 Advances in genomics
1.1.1. Towards precision medicine
1.2. Genetic association methods
1.2.2. Rare versus common variants
1.2.3. Rare variant association testing
1.3. Neurodegenerative disease genetics
1.3.1. Complex etiology of neurodegenerative disease
1.3.2. Neuropathological spectrum of synucleinopathies
1.3.3. Genetic basis in familial and sporadic disease
1.3.4. From genetic association to molecular mechanism
1.3.5. Gene expression profiling in neurodegenerative disease
1.3.6. Bridging the gap between genetic modifiers and gene expression
1.3.7 Addressing missing heritability in Parkinson’s disease
1.3.8. Cellular models of proteotoxicity
1.3.8.1. Genome-wide screens for proteotoxicity in yeast Dina Zielinski
1.3.8.2. Transposing molecular networks across species
2. Rare variant analysis of synucleinopathies
2.1. Introduction
2.2. Results
2.2.1. Human genetic analysis
2.2.2. Gene targeting rationale
2.2.3. Discovery cohort
2.2.4. Joint variant calling with healthy, aged controls
2.2.5. Variant detection and filtering
2.2.6. Rare variant association analysis
2.2.7. Gene-based burden analysis
2.2.8. Analysis of additional cohorts
2.2.8.1. Multiple system atrophy cohort
2.2.8.2. Independent PD case-control cohorts
2.2.8.3. Expression profiling of post-mortem brain samples
2.2.9. Rare variant burden in known pathogenic mutation carriers
2.2.10. Concomitant a-beta and a-synuclein pathology
2.2.11. Functional Validation
2.2.11.1. Functional characterization of missense variants
2.2.11.2. Genome editing to test variants in iPSC models
2.3. Discussion and Perspectives
2.4. Methods
2.4.1. Targeted exome sequencing and joint calling
2.4.2. Sequencing quality control
2.4.3 Individual level filtering
2.4.4. Variant level filtering
2.4.5. Assessing the distribution of rare variants between cases and controls
2.4.6. Analysis of synonymous variants
2.4.7. Genomic control
2.4.8. Non-uniform p-value correction
2.4.9. Covariate adjustment
2.4.10. Linkage patterns
2.4.11. Edgetic Analysis
References