Author(s): Douglas S. Goodin 1,*, Pouya Khankhanian 2, Pierre-Antoine Gourraud 1,3,4, Nicolas Vince 3,4
The basis of genetic susceptibility to multiple sclerosis (MS) is complex [1-3]. Thus, currently, there are over 200 MS associated common risk variants in different genomic regions that have been identified by genome wide association screens (GWAS) comparing MS patients to controls [4-12]. These GWAS studies typically evaluate the disease associations for ~500,000 single nucleotide polymorphisms (SNPs) scattered throughout the genome [4-12]. Despite the large number of genetic associations defined by these increasingly available GWAS studies, several alleles of the human leukocyte antigens (HLA), located in the major histocompatibility complex (MHC) on the short arm of chromosome 6 (6p21.3), were identified more than four decades ago. The most prominent of these HLA associations (by far) is with the HLA-DRB1*15 :01 allele, which typically has an odds ratio (OR) of more than three for heterozygotes and more than six for homozygotes [9,13-20]. Also, other alleles at the DRB1 locus (e.g., HLA-DRB1*03 :01 and HLA-DRB1*13 :03 ) are known to be associated with an increased risk of getting MS [1,11,21]. However, even with the large number of defined genetic associations with MS, most of the genetic risk in MS remains unexplained. In addition, as shown in Figure A in S3 File, the large majority of the population does not even belong to the subgroup of individuals who are "genetically susceptible" to getting MS . Observations such as these have created a so-called "heritability gap". Such a gap is a common finding in many complex genetic disorders [1,2] and is likely due (at least in part) to the phenomenon of "synthetic association" , in which a reported association is simply tagging a genomic region rather than identifying a causal variant. Indeed, both single SNPs and single alleles can be associated with several haplotypes sometimes spread over a considerable genetic distance [23-34]. For example, despite the apparently well-established association of MS susceptibility with the HLA-DRB1*15 :01 allele, this association might be due to a synthetic association [18,19]. Moreover, as demonstrated in Figure A in S3 File , even for the HLA-DRB1*15 :01 allele, the large majority of its carriers do not even belong to the subset of individuals who are "genetically susceptible" to getting MS .
Some of the haplotypes in the MHC region are highly conserved extended haplotypes (CEHs), which span more than 2.7 megabases (mb) [23-28,30,32-36]. These CEHs exist even though the MHC region encompasses several recombination hotspots and the region as a whole has an average recombination rate of ~0.4 centimorgans (cM) per mb [27,34,37,38]. Proposed mechanisms to account for this kind of extended linkage are: "frozen blocks" of DNA, preservation of ancestral lineages, haplotype-specific suppression of recombination / mutation in parts of the MHC region, or some form of balancing evolution, in which heterozygosity is favored [24,39-43]. Several of these CEHs include HLA-DRB1*15 :01 , HLA-DRB1*03 :01 , HLA-DRB1*13 :03 , or other alleles. For example, the haplotypes: [see PDF for formula]...