TM-align: a protein structure alignment algorithm based on the TM-score

Yang Zhang¹, Jeffrey Skolnick

Affiliations

PMID: 15849316
PMCID: PMC1084323
DOI: 10.1093/nar/gki524

Comparative Study

TM-align: a protein structure alignment algorithm based on the TM-score

Yang Zhang et al. Nucleic Acids Res. 2005.

. 2005 Apr 22;33(7):2302-9.

doi: 10.1093/nar/gki524. Print 2005.

Authors

Yang Zhang¹, Jeffrey Skolnick

Affiliation

¹ Center of Excellence in Bioinformatics, University at Buffalo 901 Washington Street, Buffalo, NY 14203, USA.

PMID: 15849316
PMCID: PMC1084323
DOI: 10.1093/nar/gki524

Abstract

We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is approximately 4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff <95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 A and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions. The TM-align program is freely downloadable at http://bioinformatics.buffalo.edu/TM-align.

PubMed Disclaimer

Figures

**Figure 1**
Illustrative example of structure alignments by different alignment methods for 1atzA and 1auoA. The first row is the ribbon diagram of the native structures of 1atzA (184 residues) and 1auoA (218 residues), which have a sequence identity 16% and adopt the common αβα-sandwich topology. The second and third rows are the structure superposition between the aligned residues by CE (17) and SAL (18), DALI (38) and TM-align algorithms, respectively. The thick and thin backbones denote the aligned residues from 1atzA and 1auoA, respectively. The indicated numbers are the length of aligned residues, the RMSD between the aligned residues, and the TM-score normalized by the length of 1atzA. All the pictures are generated by RASMOL () with blue to red running from the N- to C-terminus.

**Figure 2**
Number of folds included in the representative protein sets collected from the PDB library on January 28, 2005 using different sequence identity cutoffs. A fold is defined using a TM-score threshold of 0.5.

**Figure 3**
Two examples of protein pairs that have high sequence identities but adopt entirely different folds. In both examples, the upper parts show the sequence alignments of the proteins and ‘:’ denotes the residues with identical amino acids; the lower parts are the cartoon structures of the proteins with blue to red running from N- to C-terminus. The proteins in the first example are from 1a64A (32) and the N-terminal domain of 1hngB (39). The deletion mutation of two key residues (K44 and M45) induces a domain swapping of two proteins. The proteins in the second example are from the calmodulin binding domain (CaMBD), where 1g4yB is the crystal structure from Ca²⁺-loaded CaMBD in complex with calmodulin (40) and 1kkdA is the NMR structure from Ca²⁺-free CaMBD in complex with calmodulin (33). Ca²⁺-binding is responsible for the conformational changes of the two structures.

**Figure 4**
Structure alignments of the computer models by TASSER (8) to non-homologous proteins in the PDB library (6). (A) TM-score between the closest template to the native structure found by TM-align and the native structure versus the TM-score between the TASSER model and the native. (B) TM-score between the TASSER model and the closest found (highest TM-score) template versus the TM-score between the TASSER model and the native. (C) RMSD between the closest template to the native structure and the native structure versus RMSD between the model and the native. (D) RMSD between the model and the closest template versus the RMSD between the model and the native. The stars denote the alignment coverage of the closest templates found by TM-align. The yellow solid circles denote the average of the points fallen in the intervals of the horizontal axis in each picture. The black lines are to guide the eye.

**Figure 5**
A comparison of a computer model generated by TASSER (8) and the closest PDB structure (template) found by TM-align. This is a typical example where the model has a much larger RMSD than the template because of the misoriented tails and loops. The thick backbones are the model or template and the thin ones the native structure of 1c0fS. The red residues are those residues where their distances are <5 Å in the TM-score rotation matrix.

See this image and copyright information in PMC

References

1. Murzin A.G., Brenner S.E., Hubbard T., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. - PubMed
1. Orengo C.A., Michie A.D., Jones S., Jones D.T., Swindells M.B., Thornton J.M. CATH—a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. - PubMed
1. Moult J., Fidelis K., Zemla A., Hubbard T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins. 2003;53:334–339. - PubMed
1. Skolnick J., Fetrow J.S., Kolinski A. Structural genomics and its importance for gene function analysis. Nat. Biotechnol. 2000;18:283–287. - PubMed
1. Baker D., Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TM-align: a protein structure alignment algorithm based on the TM-score

Affiliation

TM-align: a protein structure alignment algorithm based on the TM-score

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources