Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct;23(10):1590-600.
doi: 10.1101/gr.158436.113. Epub 2013 Jul 18.

GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination

Affiliations

GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination

Paul A Ginno et al. Genome Res. 2013 Oct.

Abstract

Strand asymmetry in the distribution of guanines and cytosines, measured by GC skew, predisposes DNA sequences toward R-loop formation upon transcription. Previous work revealed that GC skew and R-loop formation associate with a core set of unmethylated CpG island (CGI) promoters in the human genome. Here, we show that GC skew can distinguish four classes of promoters, including three types of CGI promoters, each associated with unique epigenetic and gene ontology signatures. In particular, we identify a strong and a weak class of CGI promoters and show that these loci are enriched in distinct chromosomal territories reflecting the intrinsic strength of their protection against DNA methylation. Interestingly, we show that strong CGI promoters are depleted from the X chromosome while weak CGIs are enriched, a property consistent with the acquisition of DNA methylation during dosage compensation. Furthermore, we identify a third class of CGI promoters based on its unique GC skew profile and show that this gene set is enriched for Polycomb group targets. Lastly, we show that nearly 2000 genes harbor GC skew at their 3' ends and that these genes are preferentially located in gene-dense regions and tend to be closely arranged. Genomic profiling of R-loops accordingly showed that a large proportion of genes with terminal GC skew form R-loops at their 3' ends, consistent with a role for these structures in permitting efficient transcription termination. Altogether, we show that GC skew and R-loop formation offer significant insights into the epigenetic regulation, genomic organization, and function of human genes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
GC skew distinguishes four promoter classes in the human genome. (A–D) Metaplots of GC skew (red line), GC percentage (green line), and CpG observed over expected ratio (o/e; blue line) were determined for each class of promoters over a 5-kb window centered around the TSS. (E) Expression levels (RPKM) for each GC skew promoter class, as determined in H1 hESCs. (F) Top gene ontology hits for Class IV genes. The x-axis represents the P-value of enrichment after Bonferroni correction.
Figure 2.
Figure 2.
Promoter classes present distinct epigenetic signatures in hESCs. (A) DNA methylation metaplots for each of the four promoter classes over a 10-kb window centered on the TSS. The numbers of promoters in each class were as follows: Class I, 8332; Class 2, 5799; Class 3, 7968; and Class 4, 2099. (B–D) Histone modification metaplots for each promoter class for H4K20me1 (B), H3K79me2 (C), and H3K27me3 (D). (E,F) Average binding profiles for each promoter class for EZH2 (E) and RNF2 (RING1B) (F). Class-specific color codes are all identical and indicated in panel B. The y-axes in panels B through F represent arbitrary units.
Figure 3.
Figure 3.
Gene density strongly affects the distribution of Class I and Class II genes and the X chromosome represents an exception to the autosomal trends. (A,B) The distribution of Class I and Class II genes on individual chromosomes is represented as a percentage of total RefSeq genes on that chromosome (y-axis) plotted against a measure of gene density (x-axis; CGI/Mb, a set of 10,279 high confidence promoter CGIs, was used) (Bock et al. 2007). The X chromosome is shown in blue; autosomes are in red; a few relevant chromosomes are indicated. The data was fit to a linear regression shown here with the corresponding 95% confidence interval. (C) Schematic representation of the manner by which a gene-rich region may enable a shared epigenetic state (arrows) between neighboring genes, while a gene-poor region may not. CGI promoters are shown by green boxes; peaks of G-skew or C-skew are shown by red and blue boxes, respectively. (D) The distribution of Class I (left) and Class II (right) genes is represented as a percentage of total RefSeq genes calculated over each X-chromosome evolutionary strata (PAR1, 0–2.8 Mb; XAR, 2.8–46.8 Mb; S2a, 46.8–60 Mb; XCR, 60–148.6 Mb; and S2b, 148.6–154.8 Mb). The expected percentage of Class I and Class II genes based on their autosomal distributions is shown by a straight line together with standard deviation (dotted lines). The X-inactivation efficiency across each strata is color-coded and was determined from Carrel and Willard (2005).
Figure 4.
Figure 4.
Terminal GC skew is a novel feature of a subset of human genes that correlates with high gene density. (A) GC skew metaplot for genes with co-oriented terminal GC skew. The window is centered on the 3′ end of each gene (as defined by RefSeq annotation) and calculated using a 100-bp sliding window. The box whisker plot represents the distribution of GC skew peak starts. (B) Chromosomal distribution of genes with 3′ GC skew. Symbols are as in Figure 3. (C) Schematic representation of the arrangement of genes with terminal GC skew relative to their closest neighbor (focusing on neighbors located <2 kb away).
Figure 5.
Figure 5.
DRIP-seq illustrates R-loop formation at the 5′ and 3′ ends of human genes. (AD) DRIP-seq profiles. The SkewR track shows regions of GC skew with red indicating G-rich blocks; blue, C-rich blocks. DRIP 1 and DRIP 2 correspond to DRIP-seq experiments for which the genome was fragmented with two distinct cocktails of restriction enzymes (cut sites are indicated below each DRIP data set). The DRIP peak track indicates the consensus DRIP signal. (A,B) An R-loop at the TSS and TTS of a gene, respectively. (C) R-loop forms at the TTSs of two convergent genes. (D) The PODXL2 gene shows both TSS and TTS R-loops. Note that the TTS is followed closely by the TSS of the neighboring ABTB1 gene. (E) Distribution of DRIP-seq peaks over TSS and TTS classes.
Figure 6.
Figure 6.
Clustering of Class I promoters reveals new correlations between the genetic and epigenetic landscapes of CGI promoters. (A) Average GC skew profiles for the three main Class I promoter clusters. Each panel represents relevant genetic and epigenetic profiles for each cluster, including (B) CpG density, (C) GC content, (D) DNA methylation profiles, (E) H3K4me3 profiles, and (F) first exon length (represented in a boxplot format). Color codes are as indicated.
Figure 7.
Figure 7.
Terminal GC skew also confers a measure of protection against DNA methylation. The graph represents the average DNA methylation profiles of genes with and without terminal GC skew. All genes were aligned at their TTS and DNA methylation in hESCs was from Laurent et al. (2010). Genes whose TSS was located within ≤2 kb to the nearest downstream promoter were also filtered out (tandem filter) to remove any confounding effects due to the presence of a nearby protected promoter.

References

    1. Aguilera A, Garcia-Muse T 2012. R loops: From transcription byproducts to threats to genome stability. Mol Cell 46: 115–124 - PubMed
    1. Allis CD, Jenuwein T, Reinberg D, ed. 2007. Epigenetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
    1. Aravin AA, Sachidanandam R, Bourc'his D, Schaefer C, Pezic D, Toth KF, Bestor T, Hannon GJ 2008. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Mol Cell 31: 785–799 - PMC - PubMed
    1. Auton A, Fledel-Alon A, Pfeifer S, Venn O, Segurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, et al. 2012. A fine-scale chimpanzee genetic map from population sequencing. Science 336: 193–198 - PMC - PubMed
    1. Baker TA, Kornberg A 1988. Transcriptional activation of initiation of replication from the E. coli chromosomal origin: An RNA-DNA hybrid near oriC. Cell 55: 113–123 - PubMed

Publication types

Associated data