. 2000 Sep 12;97(19):10383-8.

doi: 10.1073/pnas.97.19.10383.

Native protein sequences are close to optimal for their structures

B Kuhlman¹, D Baker

Affiliations

PMID: 10984534
PMCID: PMC27033
DOI: 10.1073/pnas.97.19.10383

Native protein sequences are close to optimal for their structures

B Kuhlman et al. Proc Natl Acad Sci U S A. 2000.

. 2000 Sep 12;97(19):10383-8.

doi: 10.1073/pnas.97.19.10383.

Authors

B Kuhlman¹, D Baker

Affiliation

¹ Department of Biochemistry and Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, WA 98195, USA.

PMID: 10984534
PMCID: PMC27033
DOI: 10.1073/pnas.97.19.10383

Erratum in

Proc Natl Acad Sci U S A. 2000 Nov 21;97(24):13460

Abstract

How large is the volume of sequence space that is compatible with a given protein structure? Starting from random sequences, low free energy sequences were generated for 108 protein backbone structures by using a Monte Carlo optimization procedure and a free energy function based primarily on Lennard-Jones packing interactions and the Lazaridis-Karplus implicit solvation model. Remarkably, in the designed sequences 51% of the core residues and 27% of all residues were identical to the amino acids in the corresponding positions in the native sequences. The lowest free energy sequences obtained for ensembles of native-like backbone structures were also similar to the native sequence. Furthermore, both the individual residue frequencies and the covariances between pairs of positions observed in the very large SH3 domain family were recapitulated in core sequences designed for SH3 domain structures. Taken together, these results suggest that the volume of sequence space optimal for a protein structure is surprisingly restricted to a region around the native sequence.

PubMed Disclaimer

Figures

**Figure 1**
Sequence identity between designed and native sequences for core residues as a function of crystal structure resolution. The average sequence identity to native for sequences generated for a set of 88 NMR structures is shown as a square.

**Figure 2**
Sequence alignments between designed (DES) and wild-type (WT) sequences for four proteins. A black background indicates identical amino acids and a gray background indicates similar amino acids. The following PDB files were used: Hpr (1poh), CI2 (1ypc), CspB (1csp), and Fyn (1avz).

**Figure 3**
Sequence conservation in designed sequences correlates with sequence identity to the native sequence and sequence conservation in protein families. When the design program shows a strong preference for a particular amino acid at a sequence position, it more often prefers the native amino acid, and the residue is likely to have low sequence variability in naturally occurring sequences. Each position in each redesigned protein was assigned to a bin (x axis) based on the sequence entropy (∑frequency(aa_i)⋅ln(frequency(aa_i)) summed over all 20 amino acids, aa_i) at the position in a large set of sequences generated by the Monte Carlo search procedure (the numbers of residues in bins 1–7 are, respectively, 86, 91, 91, 126, 107, 79, and 94; higher sequence entropy is to the right). The left y axis indicates the percentage of residue positions that had the native amino acid in the designed sequences. The right y axis indicates the average sequence variability observed in naturally occurring sequences as derived from multiple sequence alignments (MSAs). The MSAs were taken from HSSP files (21). Results are shown for core residues. Only residue positions that had at least 10 sequences in the MSA were used (60 proteins total).

**Figure 4**
Sequence design for alternative backbone conformations. Sequences were designed for 9 proteins for which there is a NMR and crystal structure available. The free energy of a sequence (in kcal/mol) is plotted against sequence identity (over all residues) to the native sequence. The results for the crystal structures are shown as open squares (for Cl2 and fyn, two independently determined crystal structures were used). The following PDB files were used: CI2 (1ypc, 2ci2, 3ci2), ubiquitin (1d3z, 1ubq), CspB (1csp, 1nmf), fyn (1a0n, 1avz, 1efn), Hpr (1hdn, 1poh), protein L (2ptl, (J. O'Neill and K. Zhang, personal communication)), tendamistat (1brn, 2ait), interleukin (1icw, 1il8), and barstar (1a19, 1abt).

**Figure 5**
Amino acid profiles for six core residues in SH3 domains. The empty bars are derived from an SH3 domain MSA (S. M. Larson and A. R. Davidson, personal communication) and the shaded bars, from 11,000 computed sequences generated by using the backbones from 11 separate SH3 domain structures.

**Figure 6**
Sequence covariances derived from an SH3 domain MSA compared with covariances derived from computer-generated sequences. Each point corresponds to one pair of covarying residues. φ values greater than 0 indicate a positive covariance (see *Methods*), whereas values less than 1 indicate a negative covariance. The covariances in the MSA were identified by Larson *et al*. (S. F. Larson, A. A. Di Nardo, and A. R. Davidson, personal communication).

See this image and copyright information in PMC

References

1. Sauer R T. Folding Des. 1996;1:R27–R30. - PubMed
1. Plaxco K W, Riddle D S, Grantcharova V, Baker D. Curr Opin Struct Biol. 1998;8:80–85. - PubMed
1. Saven J G, Wolynes P G. J Phys Chem B. 1997;101:8375–8389.
1. Desjarlais J R, Handel T M. Protein Sci. 1995;4:2006–2018. - PMC - PubMed
1. Koehl P, Levitt M. J Mol Biol. 1999;293:1183–1193. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Native protein sequences are close to optimal for their structures

Affiliation

Native protein sequences are close to optimal for their structures

Authors

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources