Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 15;23(6):673-9.
doi: 10.1093/bioinformatics/btm009. Epub 2007 Jan 19.

Identifying bacterial genes and endosymbiont DNA with Glimmer

Affiliations

Identifying bacterial genes and endosymbiont DNA with Glimmer

Arthur L Delcher et al. Bioinformatics. .

Abstract

Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host.

Results: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella.

Availability: Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Scoring an open reading frame from the stop codon backwards. The stop codon is at position 0 on the X-axis and the cumulative log-odds score is plotted as the solid line. Positions of possible start codons are indicated by vertical dashed lines. This ORF contains the fructose bis-P aldolase gene in Escherichia coli (EG14062) and the current Ecogene verified start site is at position 1050, near the peak score. This position is an update of the originally annotated start at position 1122.

References

    1. Badger JH, Olsen GJ. CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 1999;16:512–524. - PubMed
    1. Besemer J, Borodovsky M. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 1999;27:3911–3920. - PMC - PubMed
    1. Besemer J, et al. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001;29:2607–2618. - PMC - PubMed
    1. Borodovsky M, McIninch J. Recognition of genes in DNA sequence with ambiguities. Biosystems. 1993;30:161–171. - PubMed
    1. Delcher AL, et al. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999;27:4636–4641. - PMC - PubMed

Publication types