Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Apr;16(4):513-27.
doi: 10.1016/j.str.2008.01.012.

Structure prediction of domain insertion proteins from structures of individual domains

Affiliations

Structure prediction of domain insertion proteins from structures of individual domains

Monica Berrondo et al. Structure. 2008 Apr.

Abstract

Multidomain proteins continue to be a major challenge in protein structure prediction. Here we present a Monte Carlo (MC) algorithm, implemented within Rosetta, to predict the structure of proteins in which one domain is inserted into another. Three MC moves combine rigid-body and loop movements to search the constrained conformation by structure disruption and subsequent repair of chain breaks. Local searches find that the algorithm samples and recovers near-native structures consistently. Further global searches produced top-ranked structures within 5 A in 31 of 50 cases in low-resolution mode, and refinement of top-ranked low-resolution structures produced models within 2 A in 21 of 50 cases. Rigid-body orientations were often correctly recovered despite errors in linker conformation. The algorithm is broadly applicable to de novo structure prediction of both naturally occurring and engineered domain insertion proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Domain Insertion Protein Structure
A domain insertion protein consists of two domains, A (blue) and B (red). (A) Primary structure; (B) Tertiary structure. The two 11-residue linkers connecting A to B are orange.
Figure 2
Figure 2. Cartoon representation of combination MC moves and their corresponding fold trees
Each horizontal panel shows an initial position and selected perturbation locations (left), the disrupted structure after a perturbation (center), and the subsequent structural repair (right). In all panels, green represents a flexible region of the protein or the fold tree and a yellow point indicates where a specific φ/ψ angle change occurs. A: For a rigid-body move, domain A (blue) is kept fixed while domain B (red) samples the conformational space around A, causing the linkers to break. The linkers are repaired using a combination of three-residue fragment insertions and CCD loop closure. The fold tree shows a fixed jump connecting the two parts of domain A in black and the flexible jump connecting domains A and B in green. Both the linkers are flexible so that they can be repaired. B: For a loop-building move, one linker (green) is built by inserting a three-residue fragment at the point shown in yellow. The insertion of a fragment breaks the linker, and CCD is used to reclose the linker. In the fold tree, only the linker that is being repaired is flexible. C: For an insertion-flop move, small φ/ψ angle movements are made in one linker (yellow) while allowing the other linker to break. The broken linker is then rebuilt. This “flops” around the insertion domain. In the fold tree, only one jump is used to hold the host domain together.
Figure 3
Figure 3. High-resolution energy landscape (score versus rmsd) for the local search on the development set
Rmsd is calculated over all Cα atoms of the protein. (•) Decoy structures, ( formula image) refined native structure (using high resolution algorithm).
Figure 4
Figure 4. Low-resolution energy landscapes (score versus rmsd)
Rmsd is calculated over all Cα atoms of the protein. (•) Decoy structures.
Figure 5
Figure 5. High-resolution energy landscapes (score versus rmsd)
Rmsd is calculated over all Cα atoms of the protein. (•) Decoy structures, ( formula image) refined native structure (using high resolution algorithm), ( formula image) ten refined structures for the lowest rmsd structure from the low-resolution search (only shown for cases where the lowest-rmsd structure from the low-resolution search provides a high-resolution final prediction that is closer to the native structure and lower in energy than any of the refined structures from the ten top-scoring low-resolution decoys.)
Figure 6
Figure 6. Examples of accurate predictions with native-like insert domain orientation and side-chain packing
(A) Signal processing protein (1owq, Cα rmsd = 0.70Å). (B) Hypothetical protein TM0936 (1p1m, Cα rmsd = 0.64Å). (C) Flavocytochrome C3 (1qjd, Cα rmsd = 0.70Å). (D) NADPH-dependent oxidoreductase (1vj1, Cα rmsd = 0.60Å). The native structures are in dark shades with the host domain in blue, insert domain in red, linkers in orange. Structures were superimposed using only the host domain coordinates.
Figure 6
Figure 6. Examples of accurate predictions with native-like insert domain orientation and side-chain packing
(A) Signal processing protein (1owq, Cα rmsd = 0.70Å). (B) Hypothetical protein TM0936 (1p1m, Cα rmsd = 0.64Å). (C) Flavocytochrome C3 (1qjd, Cα rmsd = 0.70Å). (D) NADPH-dependent oxidoreductase (1vj1, Cα rmsd = 0.60Å). The native structures are in dark shades with the host domain in blue, insert domain in red, linkers in orange. Structures were superimposed using only the host domain coordinates.
Figure 6
Figure 6. Examples of accurate predictions with native-like insert domain orientation and side-chain packing
(A) Signal processing protein (1owq, Cα rmsd = 0.70Å). (B) Hypothetical protein TM0936 (1p1m, Cα rmsd = 0.64Å). (C) Flavocytochrome C3 (1qjd, Cα rmsd = 0.70Å). (D) NADPH-dependent oxidoreductase (1vj1, Cα rmsd = 0.60Å). The native structures are in dark shades with the host domain in blue, insert domain in red, linkers in orange. Structures were superimposed using only the host domain coordinates.
Figure 6
Figure 6. Examples of accurate predictions with native-like insert domain orientation and side-chain packing
(A) Signal processing protein (1owq, Cα rmsd = 0.70Å). (B) Hypothetical protein TM0936 (1p1m, Cα rmsd = 0.64Å). (C) Flavocytochrome C3 (1qjd, Cα rmsd = 0.70Å). (D) NADPH-dependent oxidoreductase (1vj1, Cα rmsd = 0.60Å). The native structures are in dark shades with the host domain in blue, insert domain in red, linkers in orange. Structures were superimposed using only the host domain coordinates.
Figure 7
Figure 7. Low-resolution energy landscape using different rmsd measurements for biliverdin reductase A (1gcu)
Left: Score vs. rmsd over all Cα atoms of the insert domain after superimposing the host domain; Right: Score vs. rmsd over Cα atoms of only the linker residues after superimposing the linkers.
Figure 8
Figure 8. Examples of challenging complexes where prediction failed
Native structure and best scoring decoy structures for leucyl-tRNA synthetase (1h3n) and C-terminal binding protein 3 (1hku) with the host domain in blue, insert domain in red, and linkers in orange, with the native structure in darker shades. (A) The best scoring decoy structure for leucyl-tRNA synthetase creates a more compact structure than the native structure (B). (C) In the best scoring decoy structure for C-terminal binding protein 3, more contacts occur when the insert domain is rotated 180° from the native structure (D).
Figure 8
Figure 8. Examples of challenging complexes where prediction failed
Native structure and best scoring decoy structures for leucyl-tRNA synthetase (1h3n) and C-terminal binding protein 3 (1hku) with the host domain in blue, insert domain in red, and linkers in orange, with the native structure in darker shades. (A) The best scoring decoy structure for leucyl-tRNA synthetase creates a more compact structure than the native structure (B). (C) In the best scoring decoy structure for C-terminal binding protein 3, more contacts occur when the insert domain is rotated 180° from the native structure (D).
Figure 8
Figure 8. Examples of challenging complexes where prediction failed
Native structure and best scoring decoy structures for leucyl-tRNA synthetase (1h3n) and C-terminal binding protein 3 (1hku) with the host domain in blue, insert domain in red, and linkers in orange, with the native structure in darker shades. (A) The best scoring decoy structure for leucyl-tRNA synthetase creates a more compact structure than the native structure (B). (C) In the best scoring decoy structure for C-terminal binding protein 3, more contacts occur when the insert domain is rotated 180° from the native structure (D).
Figure 8
Figure 8. Examples of challenging complexes where prediction failed
Native structure and best scoring decoy structures for leucyl-tRNA synthetase (1h3n) and C-terminal binding protein 3 (1hku) with the host domain in blue, insert domain in red, and linkers in orange, with the native structure in darker shades. (A) The best scoring decoy structure for leucyl-tRNA synthetase creates a more compact structure than the native structure (B). (C) In the best scoring decoy structure for C-terminal binding protein 3, more contacts occur when the insert domain is rotated 180° from the native structure (D).
Figure 10
Figure 10. Algorithm flow charts
(A) Low-resolution mode; (B) Details of the loop-building algorithm for low-resolution mode; (C) High-resolution mode.

References

    1. Aloy P, Ceulemans H, Stark A, Russell RB. The relationship between sequence and interaction divergence in proteins. J Mol Biol. 2003;332:989–998. - PubMed
    1. Aroul-Selvam R, Hubbard T, Sasidharan R. Domain Insertions in Protein Structures. J Mol Biol. 2004;338:633–641. - PMC - PubMed
    1. Baird GS, Zacharias DA, Tsien RY. Circular permutation and receptor insertion within green fluorescent proteins. Proc Natl Acad Sci U S A. 1999;96:11241–11246. - PMC - PubMed
    1. Barton GJ. Scop: structural classification of proteins. Trends Biochem Sci. 1994;19:554–555. - PubMed
    1. Bradley P, Baker D. Improved beta-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation. Proteins. 2006;65:922–929. - PubMed

Publication types

LinkOut - more resources