Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Oct;87(4):2647-55.
doi: 10.1529/biophysj.104.045385.

Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins

Affiliations
Comparative Study

Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins

Yang Zhang et al. Biophys J. 2004 Oct.

Abstract

We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Calpha and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201-300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native < 6.5 angstroms, with >70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD < 6.5 angstroms. Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD < 5.5 A. For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, approximately 20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Flowchart of the TASSER structure prediction methodology that consists of template identification by threading, fragment assembly, and fold selection.
FIGURE 2
FIGURE 2
Schematic representation of a piece of polypeptide chain in the combined on- and off-lattice CAS model. Each residue is described by its Cα and side-chain center of mass (SG). Although Cα's of unaligned residues (white) are confined to the underlying cubic lattice system with a lattice space of 0.87 Å, Cα's of aligned residues (yellow) are excised from threading templates and traced off lattice. SGs are always off lattice (red) and determined using a two-rotamer approximation (Zhang et al., 2003).
FIGURE 3
FIGURE 3
Histogram of the percent of foldable targets by TASSER for single-domain and multiple domain proteins.
FIGURE 4
FIGURE 4
RMSD to native of the best models in top five by TASSER versus the RMSD to native of the best initial template by PROSPECTOR_3; both RMSD calculated over the same aligned regions. (a) Easy set targets; (b) Medium/Hard set targets.
FIGURE 5
FIGURE 5
(a and b) Size distribution of the unaligned loops and tails, with the last points including all loops (tails) of length above 25 (50) residues. The solid lines connect the data points denoting all loops and tails. The dashed lines signify those loops with good stem backbones having a RMSD to native below 4 Å. (c and d) Average RMSD to native of the unaligned loops and tails by TASSER modeling as a function of the size of the modeled regions. RMSDlocal (□) denotes the root-mean-square deviation with direct superposition of native and the modeled regions; RMSDglobal (▵) is the root-mean-square deviation after the superposition of up to five neighboring stem residues in both sides of the loops or in a single side of the tails. The dashed-dotted line signifies a RMSD cutoff of 6.5 Å. The solid lines connect the data points denoting the results for all modeled loops/tails; the dashed lines denotes the results for the loops with good stem backbones.
FIGURE 6
FIGURE 6
Three representative foldable examples of transmembrane proteins by TASSER. The thin lines denote the Cα-backbone of experimental structures, and the thick lines are the predicted models. Blue to red runs from the N- to C-terminus. Below the structures are the PDB code, RMSD between the model and native structure, and the protein size.
FIGURE 7
FIGURE 7
Three representative examples of TASSER predicted models that are structurally closer to the NMR structure centroid than some of individual NMR structures. The thick backbone shows the rank-one models predicted by TASSER; the wire frame presents the structures satisfying the NMR distance constraints equally well. Blue to red runs from the N- to C-terminus. The RMSD of TASSER models to the NMR centroid for 1adr_ (α-protein), 2fnbA (β-protein), and 1dbyA (αβ-protein) are 1.6 Å, 1.9 Å, and 1.1 Å, respectively; the maximal RMSD of NMR models to the centroid are 3.6 Å, 2.3 Å, and 1.3 Å, respectively.

References

    1. Baker, D., and A. Sali. 2001. Protein structure prediction and structural genomics. Science. 294:93–96. - PubMed
    1. Baleja, J. D. 2001. Structure determination of membrane-associated proteins from nuclear magnetic resonance data. Anal. Biochem. 288:1–15. - PubMed
    1. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235–242. - PMC - PubMed
    1. Branden, C., and J. Tooze. 1999. Introduction to Protein Structure. Garland Publishing, Inc., New York.
    1. Fiser, A., R. K. Do, and A. Sali. 2000. Modeling of loops in protein structures. Protein Sci. 9:1753–1773. - PMC - PubMed

LinkOut - more resources