Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 May 18;101(20):7594-9.
doi: 10.1073/pnas.0305695101. Epub 2004 May 4.

Automated structure prediction of weakly homologous proteins on a genomic scale

Affiliations

Automated structure prediction of weakly homologous proteins on a genomic scale

Yang Zhang et al. Proc Natl Acad Sci U S A. .

Abstract

We have developed TASSER, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized C(alpha) and side-chain-based potential driven by threading-based, predicted tertiary restraints. TASSER was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm PROSPECTOR_3 have a rms deviation from native <6.5 A with approximately 80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply TASSER to the 1,360 medium-sized ORFs in the Escherichia coli genome; approximately 920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of TASSER to structural genomics, especially to proteins of low sequence identity to solved protein structures.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the tasser structure prediction methodology that consists of template identification by the prospector_3 threading algorithm (6), CAS fragment assembly, and fold selection by spicker clustering (18). The entire process for 1ayyD is shown as an example.
Fig. 2.
Fig. 2.
Schematic representation of a piece of polypeptide chain in the on- and off-lattice CAS model. Each residue is described by its Cα and side chain center of mass (SG). Whereas Cα values (white) of unaligned residues are confined to the underlying cubic lattice system with a lattice space of 0.87 Å, Cα values (yellow) of aligned residues are excised from templates and traced off-lattice. SG values (red) are always off-lattice and determined by using a two-rotamer approximation (9).
Fig. 3.
Fig. 3.
(A) Scatter plot of rmsd to native for final models by tasser versus rmsd to native for the initial templates from prospector_3 (6). The same aligned region is used in both rmsd calculations. (B) Similar data as in A, but the models are from modeller. (C) Fraction of targets with a rmsd improvement d by tasser approach greater than some threshold value. Here, d = “rmsd of template” - “rmsd of final model.” Each point in C is calculated with a bin width of 1 Å; however, the last point includes all templates with rmsd > 10 Å. (D) Similar data as in C, but the models are from modeller.
Fig. 4.
Fig. 4.
Representative examples showing the improvement of final models with respect to the initial templates. The thin lines are native structures; the thick lines signify initial templates or final models. Blue to red runs from the N terminus to the C terminus. To guide the eye, the thinner lines connect contiguous template segments. (A and B) Medium/hard set example. (A) The template (from 1a5kC) superimposed on native structure of 1fjfT with an initial rmsd of 17.2 Å. (B) The optimized model for 1fjfT superposed on the native with rmsd of 3.1 Å (3.12 Å over aligned residues). (C and D) Easy set example. (C) The template (from 1b4aA) superimposed onto the native structure of 1aoy_, with an initial rmsd of 6.12 Å. (D) The optimized model for 1aoy_ superimposed on native with rmsd of 2.42 Å (2.1 Å over aligned residues).
Fig. 5.
Fig. 5.
(A) Average rmsd to native of all unaligned/loop regions by tasser and modeller (3) as a function of loop length. The rmsd is calculated based on the superposition of up to five neighboring stem residues on both sides of the loop. (B) Histogram of the rmsd for the unaligned/loop regions with ≥4 residues (1,968 in total) modeled by tasser and modeller.
Fig. 6.
Fig. 6.
Histograms of foldable proteins using modeller (3) and tasser based on the same templates and alignments from prospector_3 (6).
Fig. 7.
Fig. 7.
Histogram distributions of the C-score (defined in Eq. 1) for the PDB benchmark proteins and E. coli genome. The targets of the best rmsd in top five clusters below and above 6.5 Å for PDB benchmark are shown in different color.

References

    1. Skolnick, J., Fetrow, J. S. & Kolinski, A. (2000) Nat. Biotechnol. 18, 283-287. - PubMed
    1. Baker, D. & Sali, A. (2001) Science 294, 93-96. - PubMed
    1. Sali, A. & Blundell, T. L. (1993) J. Mol. Biol. 234, 779-815. - PubMed
    1. Fiser, A., Do, R. K. & Sali, A. (2000) Protein Sci. 9, 1753-1773. - PMC - PubMed
    1. Bowie, J. U., Luthy, R. & Eisenberg, D. (1991) Science 253, 164-170. - PubMed

Publication types

LinkOut - more resources