Spatial Organization of Genomes
We study how a genome is organized in three dimensions inside the nucleus. The spatial organization of a genome plays important roles in regulation of genes and maintenance of genome stability. Many diseases, including cancer, are characterized by alterations in the spatial organization of the genome. How genomes are organized in three dimensions, and how this affects gene expression is poorly understood. To address this issue we study the genomes of human and yeast, using a set of powerful molecular and genomic tools that we developed.
From linear sequence to three-dimensional organization
Although the DNA of chromosomes is a linear sequence, the living genome does not function in a linear fashion. This is most clearly illustrated by the fact that genes are often regulated by elements that can be located far away along the genome sequence. Recent evidence shows that regulatory elements can act over large genomic distances by engaging in direct physical interactions with target genes, resulting in the formation of chromatin loops. Based on these observations we have proposed that the spatial organization of the genome resembles a three-dimensional network that is driven by physical associations between genes and regulatory elements, both in cis (along the same chromosome) and in trans (between different chromosomes) (Dekker (2006), Nature Methods, 3(1): 17-21).
How does the spatial organization of a genome relate to its regulation and function?
In each cell type a distinct set of genes is expressed and therefore the spatial organization of the genome will likely be cell-type specific. Insights into the mechanisms that modulate the spatial organization of the genome will greatly contribute to a better understanding of tissue-specific gene regulation and may reveal causes of human diseases that are due to defects in these processes.
In order to understand the spatial organization of a genome we try to answer the following questions. Which regulatory elements interact with each of the genes in the human genome? What drives the specificity of these interactions? Can we identify proteins that mediate these interactions? How do interactions between regulatory elements and genes result in activation and repression of genes? How do defects in these interactions result in human disease? Can we use information about chromatin interactions to generate three-dimensional models of chromosomes?
Tools we developed for mapping the spatial organization of genomes: 3C, 5C and Hi-C
We developed Chromosome Conformation Capture (3C), which is used to detect physical interactions between genomic elements (Dekker et al. Science, 2002). Using 3C we, and others, discovered that gene regulation is mediated by the three-dimensional organization of chromosomes that brings genes and their regulatory elements in close spatial proximity. 3C is now widely used and already has had a major impact on studies of genome regulation.
Large-scale detection of long-range chromatin interactions will be instrumental in mapping genome-wide networks of communication between genomic elements and the determination of the three-dimensional folding of the genome. My group was the first to combine 3C with ultra-high-throughput DNA sequencing, thereby dramatically increasing the scale at which interactions between genomic loci can be studied. Specifically, we have developed 5C, a high-throughput version of 3C for large-scale mapping of chromatin interaction networks (Dostie et al. Genome Res. 2006). To enable the community to adopt 5C and related technologies we have developed “my5C”, a publicly available set of computational tools for design of 5C studies and for visualization and analysis of any large chromatin interaction data sets (my5C.umassmed.edu; Lajoie et al. Nature Methods 2009).
Ultimately we aim to obtain detailed insights into the three-dimensional arrangements of complete genomes at Kb resolution. To this end we developed the Hi-C technology: a genome-wide and unbiased method that combines 3C with deep sequencing (Lieberman-Aiden, van Berkum et al. Science 2009). We have applied Hi-C to generate the first comprehensive and unbiased long-range interaction maps of the human genome. Hi-C data reveal both known hallmarks of nuclear organization (e.g. formation of chromosome territories, and preferred co-location of particular pairs of chromosomes) as well as novel folding principles of chromosomes. First, we found that the human genome is divided over two types of spatial compartments, one containing active chromatin, and one containing all inactive segments of the genome. Second, we discovered a novel higher order chromatin folding motif: at the megabase scale, our data are consistent with a model in which chromatin is described by a polymer state known as the fractal globule: a knot-free conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. This conformation is an extremely efficient solution for packing long chromosomes inside the nucleus. Hi-C data for GM06990 lymphoblastoid cells and for K562 erythroleukemia cells is available in a user friendly format at our website: http://dekkerlab.umassmed.edu/
Topologically Associating Domains
Using 5C technology, and in collaboration with the laboratory of Dr. Edith Heard, we discovered that mammalian chromosomes are composed of Topologically Associating Domains (TADS; Nora et al. Nature 2012). TADs are hundreds of Kb in size and are characterized by frequent interactions between loci located within the same TAD, but much lower interaction frequencies between loci located in different TADs. We found that TADs are determined by cis-acting boundary regions and spatially separate adjacent TADS. Furthermore, TADs represent functional units as genes located within TADs show related gene expression patterns. We propose that TADs are fundamental structural and functional building blocks of chromosomes.
Gene regulation by long-range looping interactions
The most direct way in which chromosome folding affects gene regulation is through the formation of long-range looping interactions between gene promoters and distal gene regulatory elements such as enhancers. As part of the ENCODE project, we recently completed a comprehensive analysis of looping interactions between genes and distal elements throughout 1% of the human genome (Sanyal, Lajoie et al. Nature 2012). We discovered >1,000 long-range interactions between promoters and distal sites that include elements resembling enhancers, promoters and CTCF-bound sites. We observed significant correlations between gene expression, promoter-enhancer interactions and the presence of enhancer RNAs. Long-range interactions display striking asymmetry with a bias for interactions with elements located ~120 Kb upstream of the TSS. Long-range interactions are often not blocked by sites bound by CTCF and cohesin implying that many of these sites do not demarcate physically insulated gene domains. Further, only ~7% of looping interactions are with the nearest gene, suggesting that genomic proximity is not a simple predictor for long-range interactions. Finally, promoters and distal elements are engaged in multiple long-range interactions to form complex networks. Our results start to place genes and regulatory elements in three-dimensional context, revealing their functional relationships.
Building three-dimensional models of chromosomes
As a first step towards studying the spatial organization of entire chromosomes we have used 3C to determine the three-dimensional structure of yeast chromosome III (Dekker et al. (2002), Science, 295: 1306-1311). We generated a matrix of interaction frequencies and developed mathematical tools to determine a population-average three-dimensional model of this ~320 kb chromosome based on the pattern of chromatin interactions (Figure 3). Chromosome III emerged as a contorted ring, due to prominent interactions between the sub-telomeric regions.
More recently, we worked with the laboratory of Dr. Marc A. Marti-Renom to use chromatin interaction data to build spatial models of chromosomal domains and even complete genomes. For instance, using 5C data for the human alpha-globin domain, we discovered that chromatin folds in globular domains of several hundreds of Kb (Baù, Sanyal, A. et al. Nat. Struct. Mol. Biol.) 2011 . These are probably equivalent to TADs (see above). In collaboration with Mark Umbarger and George Church we were able to generate a three-dimensional model of the complete genome of the bacterium Caulobacter crescentus, which led to the identification of cis-elements that determine the folding of the entire genome (Umbarger, Toro et al. Mol. Cell 2011.