Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Sep;17(9):1278-85.
doi: 10.1101/gr.6533407. Epub 2007 Aug 3.

Human gene organization driven by the coordination of replication and transcription

Affiliations

Human gene organization driven by the coordination of replication and transcription

Maxime Huvet et al. Genome Res. 2007 Sep.

Abstract

In this work, we investigated a large-scale organization of the human genes with respect to putative replication origins. We developed an appropriate multiscale method to analyze the nucleotide compositional skew along the genome and found that in more than one-quarter of the genome, the skew profile presents characteristic patterns consisting of successions of N-shaped structures, designated here N-domains, bordered by putative replication origins. Our analysis of recent experimental timing data confirmed that, in a number of cases, domain borders coincide with replication initiation zones active in the early S phase, whereas the central regions replicate in the late S phase. Around the putative origins, genes are abundant and broadly expressed, and their transcription is co-oriented with replication fork progression. These features weaken progressively with the distance from putative replication origins. At the center of domains, genes are rare and expressed in few tissues. We propose that this specific organization could result from the constraints of accommodating the replication and transcription initiation processes at chromatin level, and reducing head-on collisions between the two machineries. Our findings provide a new model of gene organization in the human genome, which integrates transcription, replication, and chromatin structure as coordinated determinants of genome architecture.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Factory-roof pattern of the skew profile. (A) Skew (S) profile around an experimentally identified replication origin. The skew is computed along a DNA fragment containing the experimentally determined replication origin associated with the MYC gene (Vassilev and Johnson 1990) (red arrow). S is computed in 1-kbp adjacent windows of masked sequences; (red) + genes (coding strand identical to the Watson strand); (blue) − genes (opposite direction); (black) intergenic regions (the color of each point is defined by the majority rule). In abscissa, the position on the sequence; in ordinate, the skew, S, in percent. (Red vertical lines) Putative replication origins associated with upward transitions of the S profile. (B–E) Working model of the factory-roof pattern of the S profile. We propose that this pattern results from the superimposition, in germ-line cells, of strand asymmetries associated with replication and transcription. (B) Model of the replication-associated skew profiles corresponding to two fixed putative adjacent replication origins, Ori1 and Ori2, and to a replication termination site (Ter) occurring with equal probability between Ori1 and Ori2 (adapted from Touchon et al. 2005). Upward or downward jumps of the S profile correspond to the origin and termination positions, respectively. (Left) Three elementary skew profiles, Si, Sj, and Sk, are associated with three successive replication cycles and display three different Ter positions. (Middle) Superimposition of the Si, Sj, and Sk profiles. (Right) Superimposition of a large number of elementary skew profiles, ultimately leading to a pattern decreasing linearly in the 5′ to 3′ direction; note that reverse complementation of the sequence leaves the factory roof structure intact. (C) Final replication-associated skew profile. (D) Transcription-associated skew profile showing positive step-like blocks at + gene positions and negative step-like blocks at − gene positions. (E) Superimposition of the replication- and transcription-associated skew profiles producing the final factory-roof pattern that defines the N-domains.
Figure 2.
Figure 2.
Properties of the N-domains detected in the human genome. (A) Examples of N-domains detected in the chromosome 13. S values are computed in 1-kbp windows (without repeats); (red) + genes; (blue) − genes; (black) intergenic regions; the N-domain borders are indicated by red vertical lines. In abscissa, the window position is in megabase pairs; in ordinate, the skew, S, in percent. (B) Mean S profile of the N-domains. The mean S values are computed along the N-domains of length L ≤ 1.2 Mbp. In abscissa, the region used for analysis extends from the extremity to the center of each domain. In ordinate, the mean skew, S, in percent ± SEM. (C) Mean S profile of the half-domains for L < 0.75 Mbp (red), 0.75 < L < 1.2 Mbp (blue), 1.2 < L < 2 Mbp (purple), and L > 2 Mbp (green). The sequences of the 3′ halves of the domains are reverse-complemented and analyzed together with the 5′ halves. (D) Mean skew profile of + genes located in 5′ half-domains analyzed together with − genes (reverse-complemented) located in 3′ halves (red) and intergenic regions (black) (both larger than 400 kbp, and situated in domains with L > 1 Mbp). In abscissa, the distance ∂ to the 5′ end of genes or intergenic regions. (E) Mean slope of the domains versus their length L; domains are ranked by L values and grouped by bins of 20 domains; in ordinate, the mean (±SEM) of the slopes in percent/megabase pair (orange); the orange hyperbolic curve is obtained by a linear regression fit of −1/slope versus L (Supplemental Fig. S5f). In red, the genes with a length >400 kbp are ranked by length of their domain, and grouped by constant bins; the mean slope is computed for each bin. The same is true for the intergenic regions (>400 kbp) (black). In abscissa, the mean length of the corresponding domains.
Figure 3.
Figure 3.
Replication timing profile of the N-domains. (A) Average replication timing values (±SEM) determined around the extremities of the domains located in chromosome 6; in abscissa, the distance to the indicated 5′ (left) or 3′ (right) closest domain extremity; in ordinate, the mean timing ratio value; data are retrieved from Woodfine et al. (2005). (B) Example of replication timing profile along a complete N-domain. Horizontal bars indicate the DNA probes (∼94 kb) used in the microarray experiments (Woodfine et al. 2005).
Figure 4.
Figure 4.
Analysis of the genes located in the N-domains. (A) Arrows indicate the R+ orientation i.e., the same orientation as the most frequent direction of putative replication fork progression; R− orientation (opposed direction); (red) + genes; (blue) − genes. (B) Gene density. The density is defined as the number of 5′ ends (for + genes) or of 3′ ends (for − genes) in 50-kbp adjacent windows, divided by the number of corresponding domains. In abscissa, the distance, d, in megabase pairs, to the closest domain extremity. (C) Mean gene length. Genes are ranked by their distance, d, from the closest domain extremity, grouped by sets of 150 genes, and the mean length (kilobase pairs) is computed for each set. (D) Relative number of base pairs transcribed in the + direction (red), − direction (blue), and nontranscribed (black) determined in 10-kbp adjacent sequence windows.
Figure 5.
Figure 5.
Expression breadth, Nt, of the genes located in the N-domains. (A) Mean expression breadth calculated using EST data (red). In abscissa, the distance, d, in megabase pairs, to the closest domain extremity. (B) Same as in A with SAGE data (red) and microarray data (black). (C) Histogram of the expression breadth (determined with EST data) of the genes located in the domains. (D) Histogram of the expression breadth of the genes with an extremity (5′ for R+ genes, 3′ for R− genes) located at distance d from the putative replication origins where d < 5% of the length of the half-domain. (E) Same as in D, but with 70% < d < 100%.
Figure 6.
Figure 6.
Model of gene organization coordinated by replication and transcription. Two successive putative replication origins (ORI) delineate a replication N-domain. (Open chromatin) Arrows illustrate an open chromatin state at replication origin position; (replication timing) the triangles figure the replication timing values along the N-domain. Replication fork orientation: the triangles indicate the proportion of replication forks progressing from each extremity to the other extremity along the domain (during the successive cell cycles, replication terminates at random sites within the domain). Breadth of expression is maximum near the replication origins and decreases toward the domain center (gray triangles). Transcription orientation: it is preferentially co-oriented with the replication fork progression; the colored triangles indicate the proportion of base pairs along the domain transcribed in the + direction (red) and – direction (blue). Gene organization: red (resp. blue) arrows indicate + (resp. −) genes in the domains.

References

    1. Berezney R., Dubey D.D., Huberman J.A., Dubey D.D., Huberman J.A., Huberman J.A. Heterogeneity of eukaryotic replicons, replicon clusters, and replication foci. Chromosoma. 2000;108:471–484. - PubMed
    1. Brodie of Brodie E.B., Nicolay S., Touchon M., Audit B., d'Aubenton-Carafa Y., Thermes C., Arneodo A., Nicolay S., Touchon M., Audit B., d'Aubenton-Carafa Y., Thermes C., Arneodo A., Touchon M., Audit B., d'Aubenton-Carafa Y., Thermes C., Arneodo A., Audit B., d'Aubenton-Carafa Y., Thermes C., Arneodo A., d'Aubenton-Carafa Y., Thermes C., Arneodo A., Thermes C., Arneodo A., Arneodo A. From DNA sequence analysis to modeling replication in the human genome. Phys. Rev. Lett. 2005;94:248103. - PubMed
    1. Cajiao I., Zhang A., Yoo E.J., Cooke N.E., Liebhaber S.A., Zhang A., Yoo E.J., Cooke N.E., Liebhaber S.A., Yoo E.J., Cooke N.E., Liebhaber S.A., Cooke N.E., Liebhaber S.A., Liebhaber S.A. Bystander gene activation by a locus control region. EMBO J. 2004;23:3854–3863. - PMC - PubMed
    1. Callan H.G. Replication of DNA in the chromosomes of eukaryotes. Proc. R. Soc. Lond. 1972;181:19–41. - PubMed
    1. Caron H., van Schaik B., van der Mee M., Baas F., Riggins G., van Sluis P., Hermus M.C., van Asperen R., Boon K., Voute P.A., van Schaik B., van der Mee M., Baas F., Riggins G., van Sluis P., Hermus M.C., van Asperen R., Boon K., Voute P.A., van der Mee M., Baas F., Riggins G., van Sluis P., Hermus M.C., van Asperen R., Boon K., Voute P.A., Baas F., Riggins G., van Sluis P., Hermus M.C., van Asperen R., Boon K., Voute P.A., Riggins G., van Sluis P., Hermus M.C., van Asperen R., Boon K., Voute P.A., van Sluis P., Hermus M.C., van Asperen R., Boon K., Voute P.A., Hermus M.C., van Asperen R., Boon K., Voute P.A., van Asperen R., Boon K., Voute P.A., Boon K., Voute P.A., Voute P.A., et al. The human transcriptome map: Clustering of highly expressed genes in chromosomal domains. Science. 2001;291:1289–1292. - PubMed

Publication types