Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Sep 12;97(19):10383-8.
doi: 10.1073/pnas.97.19.10383.

Native protein sequences are close to optimal for their structures

Affiliations

Native protein sequences are close to optimal for their structures

B Kuhlman et al. Proc Natl Acad Sci U S A. .

Erratum in

  • Proc Natl Acad Sci U S A. 2000 Nov 21;97(24):13460

Abstract

How large is the volume of sequence space that is compatible with a given protein structure? Starting from random sequences, low free energy sequences were generated for 108 protein backbone structures by using a Monte Carlo optimization procedure and a free energy function based primarily on Lennard-Jones packing interactions and the Lazaridis-Karplus implicit solvation model. Remarkably, in the designed sequences 51% of the core residues and 27% of all residues were identical to the amino acids in the corresponding positions in the native sequences. The lowest free energy sequences obtained for ensembles of native-like backbone structures were also similar to the native sequence. Furthermore, both the individual residue frequencies and the covariances between pairs of positions observed in the very large SH3 domain family were recapitulated in core sequences designed for SH3 domain structures. Taken together, these results suggest that the volume of sequence space optimal for a protein structure is surprisingly restricted to a region around the native sequence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequence identity between designed and native sequences for core residues as a function of crystal structure resolution. The average sequence identity to native for sequences generated for a set of 88 NMR structures is shown as a square.
Figure 2
Figure 2
Sequence alignments between designed (DES) and wild-type (WT) sequences for four proteins. A black background indicates identical amino acids and a gray background indicates similar amino acids. The following PDB files were used: Hpr (1poh), CI2 (1ypc), CspB (1csp), and Fyn (1avz).
Figure 3
Figure 3
Sequence conservation in designed sequences correlates with sequence identity to the native sequence and sequence conservation in protein families. When the design program shows a strong preference for a particular amino acid at a sequence position, it more often prefers the native amino acid, and the residue is likely to have low sequence variability in naturally occurring sequences. Each position in each redesigned protein was assigned to a bin (x axis) based on the sequence entropy (∑frequency(aai)⋅ln(frequency(aai)) summed over all 20 amino acids, aai) at the position in a large set of sequences generated by the Monte Carlo search procedure (the numbers of residues in bins 1–7 are, respectively, 86, 91, 91, 126, 107, 79, and 94; higher sequence entropy is to the right). The left y axis indicates the percentage of residue positions that had the native amino acid in the designed sequences. The right y axis indicates the average sequence variability observed in naturally occurring sequences as derived from multiple sequence alignments (MSAs). The MSAs were taken from HSSP files (21). Results are shown for core residues. Only residue positions that had at least 10 sequences in the MSA were used (60 proteins total).
Figure 4
Figure 4
Sequence design for alternative backbone conformations. Sequences were designed for 9 proteins for which there is a NMR and crystal structure available. The free energy of a sequence (in kcal/mol) is plotted against sequence identity (over all residues) to the native sequence. The results for the crystal structures are shown as open squares (for Cl2 and fyn, two independently determined crystal structures were used). The following PDB files were used: CI2 (1ypc, 2ci2, 3ci2), ubiquitin (1d3z, 1ubq), CspB (1csp, 1nmf), fyn (1a0n, 1avz, 1efn), Hpr (1hdn, 1poh), protein L (2ptl, (J. O'Neill and K. Zhang, personal communication)), tendamistat (1brn, 2ait), interleukin (1icw, 1il8), and barstar (1a19, 1abt).
Figure 5
Figure 5
Amino acid profiles for six core residues in SH3 domains. The empty bars are derived from an SH3 domain MSA (S. M. Larson and A. R. Davidson, personal communication) and the shaded bars, from 11,000 computed sequences generated by using the backbones from 11 separate SH3 domain structures.
Figure 6
Figure 6
Sequence covariances derived from an SH3 domain MSA compared with covariances derived from computer-generated sequences. Each point corresponds to one pair of covarying residues. φ values greater than 0 indicate a positive covariance (see Methods), whereas values less than 1 indicate a negative covariance. The covariances in the MSA were identified by Larson et al. (S. F. Larson, A. A. Di Nardo, and A. R. Davidson, personal communication).

References

    1. Sauer R T. Folding Des. 1996;1:R27–R30. - PubMed
    1. Plaxco K W, Riddle D S, Grantcharova V, Baker D. Curr Opin Struct Biol. 1998;8:80–85. - PubMed
    1. Saven J G, Wolynes P G. J Phys Chem B. 1997;101:8375–8389.
    1. Desjarlais J R, Handel T M. Protein Sci. 1995;4:2006–2018. - PMC - PubMed
    1. Koehl P, Levitt M. J Mol Biol. 1999;293:1183–1193. - PubMed

Publication types

LinkOut - more resources