Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec;70(6):1487-501.
doi: 10.1111/j.1365-2958.2008.06495.x.

Small membrane proteins found by comparative genomics and ribosome binding site models

Affiliations

Small membrane proteins found by comparative genomics and ribosome binding site models

Matthew R Hemm et al. Mol Microbiol. 2008 Dec.

Abstract

The correct annotation of genes encoding the smallest proteins is one of the biggest challenges of genome annotation, and perhaps more importantly, few annotated short open reading frames have been confirmed to correspond to synthesized proteins. We used sequence conservation and ribosome binding site models to predict genes encoding small proteins, defined as having 16-50 amino acids, in the intergenic regions of the Escherichia coli genome. We tested expression of these predicted as well as previously annotated genes by integrating the sequential peptide affinity tag directly upstream of the stop codon on the chromosome and assaying for synthesis using immunoblot assays. This approach confirmed that 20 previously annotated and 18 newly discovered proteins of 16-50 amino acids are synthesized. We summarize the properties of these small proteins; remarkably more than half of the proteins are predicted to be single-transmembrane proteins, nine of which we show co-fractionate with cell membranes.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Immunoblot analysis of previously-annotated short ORFs tagged with SPA. Immunoblot analysis using anti-FLAG, alkaline phosphatase-conjugated antibodies was carried out with whole cell extracts harvested from MG1655 cultures. A. Cells were grown to (E) exponential and (S) stationary phase in LB medium. Immunoblots shown on the top row are of highly expressed small proteins, whereas immunoblots on the bottom row are of proteins with lower expression. MG1655 control lanes were run for each blot and representative lanes are shown. For those proteins with lower expression (bottom row) an additional cross-reacting band of approximate 26 kDa can also be observed for MG1655 grown to stationary phase. B. Cells were grown to (E) exponential and (S) stationary phase in LB and M63 medium containing 0.2% glucose. C. Cells were grown to (T0) exponential phase in LB medium and then treated with (−) water or (+) 1% α-methylglucoside. In all cases, a fraction equivalent to the cells in OD600 = 0.057 was loaded in each lane. The star (*) indicates the band corresponding the fusion protein. The positions of the markers for one blot are shown. This only provides the approximate sizes of the proteins because there was slight variation in the running of gels. Exposure times were optimized for each panel for visualization here, therefore the signal intensity shown does not indicate relative abundance between proteins. Given the need for longer exposure times, some background bands were detected for the immunoblots in the second row of (A).
Fig. 2
Fig. 2
Summary of approaches used to predict genes encoding small proteins. A. Homology-based searches using intergenic DNA sequences as input. B. Searches for ribosome binding sites using intergenic DNA sequences as input. C. Homology-based searching using protein sequences predicted from ribosome binding sites as input. aa = amino acids
Fig. 3
Fig. 3
Alignments for short ORFs identified on the basis of DNA homology. Gene sequences identified in the DNA-as-input search were translated and the predicted protein was used to search for homologs using tblastn. Alignments were generated using ClustalW (Fig. 1http://align.genome.jp). “*” indicates the residues are identical in all sequences and “:” and “.” respectively indicate that conserved and semi-conserved substitutions as defined by ClustalW. Swiss-Prot organism codes are from EcoGene (www.ecogene.org/modules.php?name=NEWT).
Fig. 4
Fig. 4
Immunoblot analysis of small proteins predicted on the basis of DNA homology. Whole cell extracts of MG1655 cells grown to (E) exponential and (S) stationary phase in LB medium were analyzed as in Fig. 1. Again the star (*) indicates the band corresponding the fusion protein. The caveats of the marker lane and exposure times are as for Fig. 1.
Fig. 5
Fig. 5
Immunoblot analysis of small proteins predicted on the basis of potential RBSs with high information content. A. Small proteins predicted in an initial search for genes with canonical RBSs. B. Small proteins predicted using an information theory-based RBS search for open reading frames. For both (A) and (B), whole cell extracts of MG1655 cells grown to (E) exponential and (S) stationary phase in LB medium were analyzed as in Fig. 1. Again the star (*) indicates the band corresponding the fusion protein. The caveats of the marker lane and exposure times are as for Fig. 1. Given the need for longer exposure times, some background bands were detected for the immunoblots in second row of (B).
Fig. 6
Fig. 6
Immunoblot analysis of small proteins predicted on the basis of the presence of a RBS and protein homology. Whole cell extracts of MG1655 cells grown to (E) exponential and (S) stationary phase in LB medium were analyzed as in Fig. 1. Again the star (*) indicates the band corresponding the fusion protein. The caveats of the marker lane and exposure times are as for Fig. 1.
Fig. 7
Fig. 7
Subcellular fractionation of small proteins predicted to have transmembrane segments. Whole cell lysates were generated from the SPA-tag strains and fractionated into cytoplasmic supernatant and membrane pellet fractions. Immunoblot analysis of the lysates and fractions was carried out as in Fig. 1. The star (*) indicates the band corresponding the fusion protein.

References

    1. Akanuma G, Nanamiya H, Natori Y, Nomura N, Kawamura F. Liberation of zinc-containing L31 (RpmE) from ribosomes by its paralogous gene product, YtiA, in Bacillus subtilis. J. Bacteriol. 2006;188:2715–2720. - PMC - PubMed
    1. Basrai MA, Hieter P, Boeke JD. Small open reading frames: beautiful needles in the haystack. Genome Res. 1997;7:768–771. - PubMed
    1. Bishop RE, Leskiw BK, Hodges RS, Kay CM, Weiner JH. The entericidin locus of Escherichia coli and its implications for programmed bacterial cell death. J. Mol. Biol. 1998;280:583–596. - PubMed
    1. Blattner FR, Plunkett G, III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1474. - PubMed
    1. Bocs S, Cruveiller S, Vallenet D, Nuel G, Médigue C. AMIGene: Annotation of MIcrobial Genes. Nucleic Acids Res. 2003;31:3723–3726. - PMC - PubMed

Publication types

LinkOut - more resources