Research: Regulatory Genes and MS Susceptibility

Maurano MT et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337(6099):1190-5.

Genome-wide association studies (GWAS) have identified many non-coding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during foetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.



Proteins following decoding the message from genes. These are made of DNA and are either coding (they are read to make the protein) or non-coding. The bit that is read are called EXONS and the the bits in between the exons that are not read and spliced away are called INTRONS. Many people once thought that this and the bits of DNA between GENES , the genetic unit of heredity, was just genetic junk that had been accumulated over human history. 





However it is now known that upstream of the gene there is the gene PROMOTER which determines when the gene will be switched-on to promote the production of protein and their regulatory elements that will determine how much of the gene product will be made. Thus the promotor contains site that are activated to initiate transcription.

Genes are not the same between different individuals and the non-coding and the coding genetic elements can vary between different individuals and these are known as gene alleles

Genome-wide association studies have looked in an unbiased way throughout the GENOME to see which alleles link with susceptibility to MS. So far there have been about 100 allelic variants of the 30,000 or so genes in te genome, whicht have been linked to susceptibility to MS.


These gene variants can be in coding or non-coding regions of the DNA. This current study looks at non-coding variants regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). These are sites of DNA where Deoxyribonuclease I (usually called DNase I) perferentially acts. DNase I is a nuclease that cleaves DNA preferentially at phosphodiester linkages adjacent to a pyrimidinenucleotide, yielding 5'-phosphate-terminated polynucleotides with a free hydroxyl group on position 3', on average producing tetranucleotides and is a waste-management endonuclease.

Disease-associated variants systematically perturb transcription factor recognition sequences. Many GWAS variants associated with multiple distinct diseases within the same broader disease class (e.g., inflammation) and they repeatedly localize within the recognition sites of interacting transcription factors suggested that cohorts of such transcription factors might form shared regulatory architectures. When they looked at genetic variants associated with MS from GWAS studies in the regulatory DNA for cell specific expression. These were most consistent with a signature in CD3+ T cells from cord blood and CD19+/CD20+ B cells and may be consistent with a B cell role in MS.


Despite a long appreciation of the involvement of regulatory variants in human disease, difficulty in delineating regulatory DNA regions, particularly in a cell-specific context, has until now prevented comprehensive assessment of the relationship between gene regulation and common phenotypes. The results indicate widespread and systematic localization of variants associated with a wide spectrum of common diseases and traits in regulatory DNA marked by DHSs have many implications for interpreting diverse genotype-phenotype association studies. The connection of numerous DHSs harboring GWAS SNPs (Single nucleotide polymorphisms pronounced SNIP) with promoters of distant genes expands the genomic horizon of disease and trait associations and provides a trove of plausible causal genes to explain those associations. The data also unify seemingly unconnected variants associated with related diseases by virtue of their convergent perturbation of common transcription factor networks. Tissue-selective enrichment of phenotype-associated variants raises the possibility of more focused genetic association studies that prioritize the regulatory DNA of a known or hypothesized target tissue type. Further, selective enrichment of many more weakly associated variants within regulatory DNA of pathogenic cell types points to the quantitative contribution of hundreds of variants of small effect size that modulate transcription factor binding characteristics, in contrast to Mendelian variants in transcription factor genes that may perturb entire networks. The results thus highlight a continuous quantitative spectrum of disordered gene regulation between common disease and Mendelian traits and lend a new perspective on the genetic architecture and heritability of common human diseases and phenotypic traits.




This is all very complex so best to Listen to this Podcast

Maybe you want your gene news in British English so listen to a BBC post to tell us that all that extra DNA is not just Junk. This is a differnt story along the same lines

There is another post on DNse1 Neph et al. An expansive human regulatory lexicon encoded in transcription factor footprints Nature 489, 83–90 (06 September 2012)




The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome Nature 489, 57–74

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research. 


The ENCODE project provides information on the human genome far beyond that contained within the DNA sequence — it describes the functional genomic elements that orchestrate the development and function of a human. The project contains data about the degree of DNA methylation and chemical modifications to histones that can influence the rate of transcription of DNA into RNA molecules (histones are the proteins around which DNA is wound to form chromatin). ENCODE also examines long-range chromatin interactions, such as looping, that alter the relative proximities of different chromosomal regions in three dimensions and also affect transcription. Furthermore, the project describes the binding activity of transcription-factor proteins and the architecture (location and sequence) of gene-regulatory DNA elements, which include the promoter region upstream of the point at which transcription of an RNA molecule begins, and more distant (long-range) regulatory elements. Another section of the project was devoted to testing the accessibility of the genome to the DNA-cleavage protein DNase I. These accessible regions, called DNase I hypersensitive sites, are thought to indicate specific sequences at which the binding of transcription factors and transcription-machinery proteins has caused nucleosome displacement. In addition, ENCODE catalogues the sequences and quantities of RNA transcripts, from both non-coding and protein-coding regions.

The papers are "open access" there are a large number of them so if you are interested download them. Maybe start with

Ecker JR et al. Genomics: ENCODE explained 

Labels: