. Harnessing the paradoxical phenotypes of APOE ɛ2 and APOE ɛ4 to identify genetic modifiers in Alzheimer's disease. Alzheimers Dement. 2021 May;17(5):831-846. Epub 2020 Dec 7 PubMed.

Recommends

Please login to recommend the paper.

Comments

  1. This is an elegant study by Kim and colleagues harnessing the gain in statistical power by virtue of focusing on the genetic variants that modify the risk of ApoE e4 genotype based on exome-sequencing data and using the evolutionary-action approach, i.e., a computational method to estimate the phenotypical impact of mutations.

    The study revealed a large number of genes harboring detrimental or protective mutations in ApoE e2 and ApoE e4 carriers, respectively. These exploratory results will encourage future investigations to follow up these findings in independent exome-sequencing replication cohorts as well as by genome-wide association analyses in larger cohorts.

    A major question is whether any variants in the depicted genes are modifiers, specifically, of the effect of ApoE genotype on AD, or are associated with AD risk in general. A comparison of the results with those of the recent whole-exome-sequencing study, which was also based on data from the Alzheimer’s Disease Sequencing Project, would have been great (Bis et al., 2018). 

    References:

    . Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry. 2018 Aug 14; PubMed.

    View all comments by Michael Ewers
  2. This study capitalizes on subsets of samples deemed “low genetic risk cases” (carriers of the protective APOE ε2 haplotype) and “high genetic risk controls” (carriers of the risk-increasing APOE ε4 haplotype) that underwent whole-exome sequencing (WES) as part of the Alzheimer’s Disease Sequencing Project (ADSP). The purpose in looking at these subsets is to use “extreme” sample sets, i.e., cases without known genetic risk factors and unaffected subjects with known genetic risk factors, to make it easier to try to identify novel coding variants with strong effects on disease risk as new AD susceptibility loci, by biasing the study against “the usual suspects” such as APOE and other known loci. This strategy has the potential to be richly rewarding in identifying new disease susceptibility loci, candidate pathways, and relevant gene networks.

    The genes identified in this study are interesting, as they are found in a number of pathways and networks involving known AD susceptibility loci from prior GWAS, however few prior AD GWAS loci themselves were identified. This may be because, as the authors noted, most GWAS loci fall in noncoding regions while in contrast this study of WES focuses on coding region variation, and also because their APOE-based sampling should have controlled for associations in or near that locus. However, because of the selected nature of a sample, a major question remains about the generalizability of their findings: How much do these genes/variants contribute to AD risk in aggregate in representative samples of the population, and among groups not selected for APOE e4 enrichment and case-control status? Quantifying their contribution to disease heritability overall may speak to the importance of recruiting sample sets with this kind of enrichment.

    There are some concerns about the study design. First, there are sample size limitations inherent in the design and there was no validation in independent samples. Second, the ADSP WES dataset was sequenced at three different sequencing centers using two different sequence capture kits, which may lead to variation in genotype quality. Specifically, it is unclear if the variants examined fell within targeted capture regions in all samples (in both kits), which is a concern because of highly variable but overall reduced genotype calling quality and accuracy among off-target variants, even those in sequence immediately flanking capture regions. ADSP best practices recommend exclusion of these variants, which account for nearly 50 percent of all called variants, and it is unclear whether these lower-quality variants were excluded by the filtering criteria implemented. While this may present a problem, validation in other data is highly encouraged and may still yield support for their findings.

    Finally, acknowledging the wealth of genes identified in this study that are further supported by integrating functional genomics, we have a major question about genomic search strategies like this to consider: Does a larger search space for AD candidate genes/loci help or hurt the hunt for therapies to slow or stop progression of the disease? Among the best examples of successful translations of genetic studies to clinical treatment are the development of lipid-lowering drugs based on studies that identified risk variants in HMGCR and PCSK2. Does this enhanced-AD search space filled with new candidate genes make it easier to identify therapeutic targets likely to have an effect of the disease? Also, with so many potential therapeutic targets emerging, would strategies to prioritize specific genes for further investigation help to make the studies like these more appealing and informative? This is a broad and emerging philosophical debate with which the field will grapple in coming years.

    View all comments by Adam Naj
  3. Kim et al. applied an elegant evolution-based variant functional impact weighting method and a novel discordant phenotype subsampling approach to identify genes that modify predicted AD outcomes in APOE-ε4 and ε2 carriers. Their analyses suggest etiological roles for synaptic maintenance in several cell types and the endolysosomal system in microglia. This is consistent with findings from previous analyses by our group and others (Aug 2019 newsDourlen et al., 2019). 

    However, this unorthodox association analysis methodology is not described in detail, rendering it difficult to assess the significance and robustness of the findings. It is unclear what “# of observed variants” refers to (cumulative minor allele count, number of variants), what variant and sample level QC was performed (e.g., minor allele frequency cutoff), and what technical (batch, capture, sequencing center, etc.) or biological (age, sex, population structure) covariates were included in the association analyses. Lack of replication is another issue limiting confidence in the findings. Yet it appears that only about half (5,686) of the 10,000 individuals available in the ADSP Discovery cohort were used in this study, such that the remaining half could be used as an independent replication dataset to strengthen its findings.

    There are other areas of concern. Diagnostic plots (e.g., residuals vs. fitted, normal Q-Q) for the gene discovery linear regressions would aid interpretation but are not shown. Nominal significance is used as a threshold in the permutation analysis that generated the final list of 216 iDEAL genes. Multiple testing correction and thus a much higher number of permutations would be required to constrain the number of false positive findings in this list.

    Additional evidence is provided to support the statistical association results. Differential gene expression of the iDEAL genes in AD vs. control brains, although widely used, is not very convincing for causal involvement of a gene because it could arise from cell-type proportion changes in the degenerating brain tissue or reactive (rather than causal) changes in gene expression programs within cells.

    Moreover, it is unclear how the authors generated the list of AD GWAS candidate genes used in their STRING network analysis that showed significant interconnectivity of iDEAL genes with known AD genes. GWAS are designed to identify loci, not genes, and (with a few exceptions like APOE, TREM2, PLCG2, ABI3, ABCA7, SORL1, SPI1, etc.) the underlying causal gene(s) remain(s) unknown for most AD-associated loci. Interestingly, the Drosophila Aβ/Tau model experiments identified a few promising examples, validating the dAD-ε2 and dAD-ε4 predictions, and it would be worthwhile exploring the effect of these iDEAL genes on disease-relevant phenotypes in vertebrate models.

    Despite these concerns, the authors' use of extreme phenotype sampling to improve statistical power and identify novel modifier genes holds great promise for elucidating AD pathogenic mechanisms.

    References:

    . The new genetic landscape of Alzheimer's disease: from amyloid cascade to genetically driven synaptic failure hypothesis?. Acta Neuropathol. 2019 Aug;138(2):221-236. Epub 2019 Apr 13 PubMed.

    View all comments by Alan Renton
  4. This elegant genetic study may have identified several important genes that may protect APOEε4-carriers from, or reduce APOEε2 carriers’ protection against, Alzheimer's disease. Both findings challenge our current understanding of APOE genotype as a risk factor for AD.

    Notably, combining the “evolutionary action” (EA) approach, involving the functional impact of an amino acid change within a protein’s coding sequence, and a statistical albeit complex approach called differential “imputed deviation in EA load” (iDEAL), is exceptionally novel. However, one can argue that genetics studies can go only so far without context. When the “explanation” involves more than 200 genes, how much can it actually explain?

    Associated pathway clusters will likely turn out to be more useful, but not necessarily from a strictly genetic perspective. This study found major genes involved in pathways of inflammation, lipoprotein metabolism, and synaptic function, all well-known to be sensitive to environmental influence. Tracking perturbation of the pathways in prodromal AD may be more useful than a focus on 200 simultaneous gene products.

    Likewise, earlier studies found that, for specific populations, the APOEε4 allele does not associate with increased AD risk, and the difference was attributed to environmental, not genetic factors (Gureje et al., 2006Hall et al., 2006). The ideal study for sporadic AD cases may require an understanding of the role of other "Es", i.e. the environment and epigenetics (Maloney and Lahiri, 2016). 

    References:

    . APOE epsilon4 is not associated with Alzheimer's disease in elderly Nigerians. Ann Neurol. 2006 Jan;59(1):182-5. PubMed.

    . Cholesterol, APOE genotype, and Alzheimer disease: an epidemiologic study of Nigerian Yoruba. Neurology. 2006 Jan 24;66(2):223-7. PubMed.

    . Epigenetics of dementia: understanding the disease as a transformation rather than a state. Lancet Neurol. 2016 Jun;15(7):760-74. Epub 2016 May 9 PubMed.

    View all comments by Debomoy Lahiri
  5. We thank our colleagues for noting the statistical power of focusing on paradoxical APOE2/4 genotype-phenotypes to find potential AD genes, the deeper resolution provided by a variant impact weighting method, and the added confidence provided by in vivo validation in Drosophila AD models.

    Sequence quality is paramount, certainly. Communication with the Alzheimer’s Disease Sequencing Project (ADSP) indicated the data had undergone stringent quality control and was of high quality. We confirmed this based on the average TiTv ratio (3.52 ± 0.05) and on the lambda value (0.039 ± 0.001, see Koire, Katsonis, and Lichtarge, 2016) of the variants. As detailed in Methods, only genotype calls from the Atlas calling pipeline were analyzed, and those from the GATK pipeline were excluded. Also, only half of the ADSP Discovery data were available on dbGaP for this project, or 5,686 samples. The updated, complete ADSP Discovery data that was released simultaneously with the completion of our manuscript, in February 2020, as well as the ADSP Extension cohort will now help us refine our findings.

    With respect to methodology, our approach (iDEAL) is not an association analysis so it does not use any covariate. As iDEAL calculates the differential functional mutational load of a gene between two paradoxical groups, “# of all variants” refers to the number of protein-coding variants observed in each gene. The significance of each gene’s signal is assessed using a z-score measured against a background distribution of iDEAL scores built by randomizing the labels. The lower right plot of Figure 1A shows the z-scores versus iDEAL scores for each gene. We control for age by ensuring that healthy controls were older than AD patients (Figure S1). We could not yet control for sex, as splitting samples by male/female would have overly weakened power. But, as discussed, the new ADSP Discovery and Extension cohorts should now support the analyses of males and females, separately.

    While correcting z-scores for multiple hypothesis testing may reduce false positives, we purposefully chose to be permissive in selecting our candidate genes because our goal was to perform in vivo validation in Drosophila models, as well as assess each gene’s relevance to AD in numerous other ways, including risk predictability in non-paradoxical patient groups, differential gene expression in AD versus control brains, and connection with GWAS genes. The AMPAD whole-tissue dataset is the most extensive and current gold standard for AD transcriptomics. We agree that it is not perfect (no omics set is), but as of now the single-cell datasets available lack numbers and robustness to replace it and have their own technical biases as well. For the GWAS network interactions, as indicated in the publication, we did not make the gene calls but relied on the most likely candidates from careful studies (Harold et al., 2009; Hollingworth et al., 2011; Kunkle et al., 2019; Lambert et al., 2009; Lambert et al., 2013; Naj et al., 2011; Seshadri et al., 2010). 

    Here, our paper was laser-focused on APOE paradoxical genotypes, which is a distinct study design from the traditional case-control study over the complete ADSP collection. A future study that follows the latter design would provide a more relevant and fair comparison to Bis et al., 2018

    The space for AD candidate genes/loci is far from saturated. We believe that identifying more genes will not only reveal potential therapeutic avenues, but also provide more robust patient-stratification capabilities as well as stronger diagnostic tools (e.g. biomarkers for disease progression). Overlaying functional information on sequence analysis will help our predictive accuracy and increase statistical power.

    —Ismael Al‐Ramahi, and Juan Botas, all of the Baylor College of Medicine, are co-authors of this comment.

    References:

    . Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry. 2018 Aug 14; PubMed.

    . Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet. 2009 Oct;41(10):1088-93. PubMed.

    . Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nat Genet. 2011 May;43(5):429-35. PubMed.

    . REPURPOSING GERMLINE EXOMES OF THE CANCER GENOME ATLAS DEMANDS A CAUTIOUS APPROACH AND SAMPLE-SPECIFIC VARIANT FILTERING. Pac Symp Biocomput. 2016;21:207-18. PubMed.

    . Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet. 2019 Mar;51(3):414-430. Epub 2019 Feb 28 PubMed. Correction.

    . Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nat Genet. 2009 Oct;41(10):1094-9. PubMed.

    . Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet. 2013 Dec;45(12):1452-8. Epub 2013 Oct 27 PubMed.

    . Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nat Genet. 2011 May;43(5):436-41. Epub 2011 Apr 3 PubMed.

    . Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA. 2010 May 12;303(18):1832-40. PubMed.

    View all comments by Young Won Kim

Make a Comment

To make a comment you must login or register.