Automated structure prediction of weakly homologous proteins on a genomic scale

Yang Zhang¹, Jeffrey Skolnick

Affiliations

PMID: 15126668
PMCID: PMC419651
DOI: 10.1073/pnas.0305695101

Automated structure prediction of weakly homologous proteins on a genomic scale

Yang Zhang et al. Proc Natl Acad Sci U S A. 2004.

. 2004 May 18;101(20):7594-9.

doi: 10.1073/pnas.0305695101. Epub 2004 May 4.

Authors

Yang Zhang¹, Jeffrey Skolnick

Affiliation

¹ Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street, Buffalo, NY 14203, USA.

PMID: 15126668
PMCID: PMC419651
DOI: 10.1073/pnas.0305695101

Abstract

We have developed TASSER, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized C(alpha) and side-chain-based potential driven by threading-based, predicted tertiary restraints. TASSER was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm PROSPECTOR_3 have a rms deviation from native <6.5 A with approximately 80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply TASSER to the 1,360 medium-sized ORFs in the Escherichia coli genome; approximately 920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of TASSER to structural genomics, especially to proteins of low sequence identity to solved protein structures.

PubMed Disclaimer

Figures

**Fig. 1.**
Overview of the tasser structure prediction methodology that consists of template identification by the prospector_3 threading algorithm (6), *CAS* fragment assembly, and fold selection by spicker clustering (18). The entire process for 1ayyD is shown as an example.

**Fig. 2.**
Schematic representation of a piece of polypeptide chain in the on- and off-lattice *CAS* model. Each residue is described by its C_α and side chain center of mass (SG). Whereas C_α values (white) of unaligned residues are confined to the underlying cubic lattice system with a lattice space of 0.87 Å, C_α values (yellow) of aligned residues are excised from templates and traced off-lattice. SG values (red) are always off-lattice and determined by using a two-rotamer approximation (9).

**Fig. 3.**
(A) Scatter plot of rmsd to native for final models by tasser versus rmsd to native for the initial templates from prospector_3 (6). The same aligned region is used in both rmsd calculations. (B) Similar data as in A, but the models are from modeller. (C) Fraction of targets with a rmsd improvement d by tasser approach greater than some threshold value. Here, d = “rmsd of template” - “rmsd of final model.” Each point in C is calculated with a bin width of 1 Å; however, the last point includes all templates with rmsd > 10 Å. (D) Similar data as in C, but the models are from modeller.

**Fig. 4.**
Representative examples showing the improvement of final models with respect to the initial templates. The thin lines are native structures; the thick lines signify initial templates or final models. Blue to red runs from the N terminus to the C terminus. To guide the eye, the thinner lines connect contiguous template segments. (A and B) Medium/hard set example. (A) The template (from 1a5kC) superimposed on native structure of 1fjfT with an initial rmsd of 17.2 Å. (B) The optimized model for 1fjfT superposed on the native with rmsd of 3.1 Å (3.12 Å over aligned residues). (C and D) Easy set example. (C) The template (from 1b4aA) superimposed onto the native structure of 1aoy_, with an initial rmsd of 6.12 Å. (D) The optimized model for 1aoy_ superimposed on native with rmsd of 2.42 Å (2.1 Å over aligned residues).

**Fig. 5.**
(A) Average rmsd to native of all unaligned/loop regions by tasser and modeller (3) as a function of loop length. The rmsd is calculated based on the superposition of up to five neighboring stem residues on both sides of the loop. (B) Histogram of the rmsd for the unaligned/loop regions with ≥4 residues (1,968 in total) modeled by tasser and modeller.

**Fig. 6.**
Histograms of foldable proteins using modeller (3) and tasser based on the same templates and alignments from prospector_3 (6).

**Fig. 7.**
Histogram distributions of the C-score (defined in Eq. 1) for the PDB benchmark proteins and *E. coli* genome. The targets of the best rmsd in top five clusters below and above 6.5 Å for PDB benchmark are shown in different color.

See this image and copyright information in PMC

References

1. Skolnick, J., Fetrow, J. S. & Kolinski, A. (2000) Nat. Biotechnol. 18, 283-287. - PubMed
1. Baker, D. & Sali, A. (2001) Science 294, 93-96. - PubMed
1. Sali, A. & Blundell, T. L. (1993) J. Mol. Biol. 234, 779-815. - PubMed
1. Fiser, A., Do, R. K. & Sali, A. (2000) Protein Sci. 9, 1753-1773. - PMC - PubMed
1. Bowie, J. U., Luthy, R. & Eisenberg, D. (1991) Science 253, 164-170. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated structure prediction of weakly homologous proteins on a genomic scale

Affiliation

Automated structure prediction of weakly homologous proteins on a genomic scale

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources