Abstract
Many animal and plant genomes are transcribed much more extensively than current annotations predict. However, the biological function of these unannotated transcribed regions is largely unknown. Approximately 7% and 23% of the detected transcribed nucleotides during D. melanogaster embryogenesis map to unannotated intergenic and intronic regions, respectively. Based on computational analysis of coordinated transcription, we conservatively estimate that 29% of all unannotated transcribed sequences function as missed or alternative exons of well-characterized protein-coding genes. We estimate that 15.6% of intergenic transcribed regions function as missed or alternative transcription start sites (TSS) used by 11.4% of the expressed protein-coding genes. Identification of P element mutations within or near newly identified 5â² exons provides a strategy for mapping previously uncharacterized mutations to their respective genes. Collectively, these data indicate that at least 85% of the fly genome is transcribed and processed into mature transcripts representing at least 30% of the fly genome.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Change history
20 September 2006
In the HTML version of this article initially published online, the largest pieces of two pie charts in Fig. 1a (labeled "2â4 h" and "20â22 h") were in the wrong position. The error has been corrected in the HTML version of the article.
References
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776â15781 (2003).
Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149â1154 (2005).
Velculescu, V. et al. Serial analysis of gene expression. Science 270, 484â487 (1995).
Wei, C.L. et al. 5â² Long serial analysis of gene expression (LongSAGE) and 3â² LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. USA 101, 11701â11706 (2004).
Ng, P. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods 2, 105â111 (2005).
Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242â2246 (2004).
Misra, S. et al. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 3 RESEARCH0083 (2002).
Hild, M. et al. An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol. 5, R3 (2003).
Celniker, S.E. et al. Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3 RESEARCH0079 (2002).
Stolc, V. et al. A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306, 655â660 (2004).
Tomancak, P. et al. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 3 RESEARCH0088 (2002).
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916â919 (2002).
Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331â342 (2004).
Tadros, W. & Lipshitz, H.D. Setting the stage for development: mRNA translation and stability during oocyte maturation and egg activation in Drosophila Dyn. Dev. 232, 593â608 (2005).
Spradling, A.C. et al. Gene disruptions using P transposable elements: an integral component of the Drosophila genome project. Proc. Natl. Acad. Sci. USA 92, 10824â10830 (1995).
Denholm, B. et al. crossveinless-c is a RhoGAP required for actin reorganisation during morphogenesis. Development 132, 2389â2400 (2005).
Bernards, A. GAPs galore! A survey of putative Ras superfamily GTPase activating proteins in man and Drosophila. Biochim. Biophys. Acta 1603, 47â82 (2003).
Thibault, S.T. et al. A complementary transposon tool kit for Drosophila melanogaster using P and piggyBac. Nat. Genet. 36, 283â287 (2004).
Shamloula, H.K. et al. rugose (rg), a Drosophila A kinase anchor protein, is required for retinal pattern formation and interacts genetically with multiple signaling pathways. Genetics 161, 693â710 (2002).
Su, Y. et al. Neurobeachin is essential for neuromuscular synaptic transmission. J. Neurosci. 24, 3627â3636 (2004).
Strapps, W.R. & Tomlinson, A. Transducing properties of Drosophila Frizzled proteins. Development 128, 4829â4835 (2001).
Beyer, A.L. & Osheim, T.N. Splice site selection, rate of splicing, and alternative splicing on nascent transcripts. Genes Dev. 2, 754â765 (1988).
Foe, V.E. Mitotic domains reveal early commitment of cells in Drosophila embryos. Development 107, 1â22 (1989).
Bellen, H.J. et al. The BDGP gene disruption project: Single transposon insertions associated with 40% of Drosophila genes. Genetics 167, 761â781 (2004).
Jiang, J., Kosman, D., Ip, Y.T. & Levine, M. The dorsal morphogen gradient regulates the mesoderm determinant twist in early Drosophila embryos. Genes Dev. 5, 1881â1891 (1991).
Tomayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907â2912 (1999).
Acknowledgements
This project has been funded in part with Federal Funds from the National Cancer Institute, National Institutes of Health, under Contract No. N01-CO-12400, the National Human Genome Research Institute, National Institutes of Health, under Grant No. U01 HG003147, and Affymetrix, Inc.
Author information
Authors and Affiliations
Contributions
J.R.M. and S.D. contributed equally to this work. J.R.M. initiated the project and headed the molecular genetics work. S.D. headed the bioinformatics work.
Corresponding author
Ethics declarations
Competing interests
Other than F.B., all authors are employees of Affymetrix.
Supplementary information
Supplementary Fig. 1
Coverage of RefSeq genes by transfrags. (PDF 46 kb)
Supplementary Fig. 2
Examples of various transcript classes identified by sequencing of RT-PCR clones containing novel 5â² start sites. (PDF 159 kb)
Supplementary Fig. 3
Distribution of intensity ratios, SOM-based centroid profiles, and dispersion index. (PDF 969 kb)
Supplementary Table 1
Computationally predicted 5â² start sites. (PDF 109 kb)
Supplementary Table 2
Comprehensive spreadsheet of manually curated confirmed 5â² start sites. (PDF 91 kb)
Supplementary Table 3
Expression of RefSeq genes. (XLS 3075 kb)
Rights and permissions
About this article
Cite this article
Manak, J., Dike, S., Sementchenko, V. et al. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nat Genet 38, 1151â1158 (2006). https://doi.org/10.1038/ng1875
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1875