Candidate gene methylation studies are at high risk of erroneous conclusions

Citation metadata

From: Epigenomics(Vol. 7, Issue 1)
Publisher: Future Medicine Ltd.
Document Type: Report
Length: 2,187 words
Lexile Measure: 1440L

Document controls

Main content

Article Preview :

Author(s): Andrey A Shabalin aff1 , Karolina A Aberg aff1 , Edwin JCG van den Oord aff1


candidate gene study; DNA methylation; false discoveries; methylome-wide association study; principal component analysis

DNA methylation studies present a promising avenue for improving our understanding of common diseases and alleviating part of their public health burden. A commonly used approach involves testing sites in genes of interest for association with disease status. These genes are typically selected based on a priori ideas about their possible role in pathogenic processes. Compared with assaying many sites simultaneously, such candidate gene methylation studies are appealing because of their low costs. They also have the advantage of being relatively straightforward in terms of lab technical and statistical procedures. However, in this commentary we argue that specific properties of methylation studies present a serious challenge for the interpretation of findings originating from the candidate gene approach.

Common variation among large subsets of methylation sites

Recently a number of investigations assayed large sets of methylation sites simultaneously. A striking finding emerging from these studies is that the methylation statuses of large subsets of sites covary with each other [ 1,2 ]. This common variation is not restricted to specific chromosomal locations but involve methylation sites across the entire genome. Principal component analysis (PCA) provides a good approach to quantify this phenomenon. PCA captures the common variation in methylation statuses among sites in a form of a set of uncorrelated components. The first principal component (PC) accounts for as much of the variation in the methylation data as possible, the second component captures as much of the remaining variance as possible in such a way that it is uncorrelated with the first component and so forth. Bell et al . performed PCA on methylation levels at 22,290 CpGs in lymphoblastoid cell lines from 77 individuals [1 ]. They found that the first PC explained 22% of the variation in methylation, and the first three PCs together explained 33%. This large amount of variation explained does not seem to be an artifact of the specific approach. Whereas Bell et al . [1 ] used an array, Aberg et al . [2 ] used a sequencing based approach to assay all 28 million common CpGs in the human genome in a sample of 1497 subjects. Their first PC explained 27% of the variation in methylation, and the first three PCs together explained 36%. These findings seem consistent with observations that global methylation levels may vary among subjects as a function of, for example, demographic variables, life style, nutrition or disease status [3,4 ]. Such global variations are only possible when individuals differ at many sites in a similar fashion.

Impact on association testing

Association testing typically starts with calculating a test statistic for each of the investigated methylation sites. If the test statistic is greater than a critical value, the null-hypothesis, assuming that the site has no effect, is rejected. The error of rejecting the null-hypothesis when it is true results in a false positive. Not rejecting the...

Source Citation

Source Citation   

Gale Document Number: GALE|A409374829