Abstract
Developing macrocyclic binders to therapeutic proteins typically relies on large-scale screening methods that are resource intensive and provide little control over binding mode. Despite progress in protein design, there are currently no robust approaches for de novo design of protein-binding macrocycles. Here we introduce RFpeptides, a denoising diffusion-based pipeline for designing macrocyclic binders against protein targets of interest. We tested 20 or fewer designed macrocycles against each of four diverse proteins and obtained binders with medium to high affinity against all targets. For one of the targets, Rhombotarget A (RbtA), we designed a high-affinity binder (Kdâ<â10ânM) despite starting from the predicted target structure. X-ray structures for macrocycle-bound myeloid cell leukemia 1, γ-aminobutyric acid type A receptor-associated protein and RbtA complexes match closely with the computational models, with a Cα root-mean-square deviationâ<â1.5âà to the design models. RFpeptides provides a framework for rapid and custom design of macrocyclic peptides for diagnostic and therapeutic applications.

Similar content being viewed by others
Main
Macrocyclic peptides present a promising avenue for developing new therapeutics that bridge the gap between small-molecule drugs and large biologics1,2. Biologics, while capable of binding diverse therapeutic targets with high affinity and selectivity, are usually unable to cross cell membranes because of their large size and high polarity, limiting them to extracellular targets. Conversely, small molecules can access intracellular targets but are not ideal for targeting proteins lacking deep hydrophobic pockets. In principle, macrocyclic peptides with sizes between small molecules and proteins can be developed to modulate molecular targets inaccessible to traditional therapeutic modalities3. The ability to develop custom protein-binding macrocycles for diverse protein targets would have many diagnostic and therapeutic applications. Traditionally, the development of peptide therapeutics has relied on natural product discovery or high-throughput screening of trillions of random peptides for target binding using display-based techniques1,2. However, natural product discovery has several challenges, particularly synthetic difficulties, marginal stability and low mutational tolerance of identified hits4. While powerful, the high-throughput screening methods are time-intensive, cost-intensive and labor-intensive and only span a small fraction of the rich chemical and structural diversity accessible to macrocycles. Moreover, such approaches frequently fail to simultaneously optimize for multiple biophysical properties, such as target binding, selectivity and membrane permeability, because of the precise structural control required to achieve such functional properties5.
Structure-guided design methods offer a complementary approach to the library screening approaches, enabling rapid in silico exploration of a large chemical and structural diversity to design macrocycle binders for therapeutic targets. We previously developed physics-based methods for designing hyperstable constrained peptides, structured macrocycles and binders to protein targets by borrowing the motifs or interactions from previously described binding partners as anchors6,7,8,9. However, despite the high accuracy observed in the design of monomeric macrocycles with these methods7, the design of protein-binding macrocycles has had limited success, achieving only modest binding affinities and, in many cases, with the experimentally determined structures not agreeing with the design models7,8,10. The reliance on previously described binding partners for starting motifs also restricts such approaches to well-studied protein targets. In recent work, we described a pipeline for hallucinating and predicting the structures of macrocyclic peptide monomers by modifying AlphaFold2 (AF2) to include cyclic relative positional encoding (named âAfCycDesignâ)11. Other promising deep learning (DL) methods were described recently to predict the structures of macrocycles and macrocycleâtarget complexes12,13 and to design peptide binders to protein targets14,15,16. However, these methods have not been extensively structurally validated to date or shown to robustly perform atomically accurate de novo design of macrocyclic peptide structures in complexes with diverse protein targets. Computational methods that can accurately design high-affinity macrocycle binders de novo, using just the information of target structure or sequence, are required for wider therapeutic applications.
We reasoned that recent breakthroughs in generative DL methods could be leveraged to develop a robust pipeline for the accurate and efficient design of macrocycle binders. Diffusion models for protein design, such as RFdiffusion17, are trained to generate diverse protein structures from randomly initialized residues as starting points and have demonstrated remarkable success in designing protein monomers, binders and symmetric oligomers of medium-sized to large-sized proteins. However, despite considerable recent progress in DL-based protein design methods, these methods are not readily applicable to designing macrocyclic peptides. Developing analogous methods for peptide design from scratch has been challenging because of the limited availability of experimental data for training such models. To address these challenges, we set out to extend the RoseTTAFold2 (RF2)18 structure prediction network and the RFdiffusion17 protein backbone generation framework to incorporate cyclic relative positional encoding and enable the generation of the macrocyclic peptide backbones.
Extending RF2 and RFdiffusion for macrocycles
We began by examining the ability of the RF2 (ref. 18) structure prediction network to model known macrocyclic peptide structures. We implemented a modified (Methods) cyclic relative position encoding for RF2 (Fig. 1a) and observed robust prediction of natural cyclic peptide structures (Supplementary Fig. 1). Given this success, we reasoned that the same relative positional encoding should enable RFdiffusion17 to generate macrocyclic peptide structures because of its similar network architecture. We added the cyclic positional encoding scheme to RFdiffusion and observed robust generation of diverse macrocyclic peptides (Fig. 1b,c and Supplementary Fig. 2). Similar to the previously described work on designing monomeric cyclic peptides with physics-based methods7 and AfCycDesign11, we observed 9,045 and 8,913 structurally unique 10-residue and 12-residue backbones, respectively, when 48,000 macrocycle backbones were generated for each size (Supplementary Fig. 2). The distribution of phi and psi values in these generated backbones is similar to the standard Ramachandran plot for protein structures (Supplementary Fig. 2), suggesting that generated backbones do not require extensive d-amino acids to stabilize the generated structures7. While we did not attempt to comprehensively enumerate the structural space of cyclic peptide monomers, RFpeptides can readily be scaled up to comprehensively cover the structural space accessible to macrocyclic peptides. Encouraged by the transferability of the cyclic positional encoding, we set out to use RFdiffusion for the de novo design of protein-binding macrocycles. We chose RFdiffusion for several reasons. Firstly, we expected the high experimental success rate of RFdiffusion17,19 for protein binder design to carry over to macrocycle binder design. Secondly, de novo binder design with AfCycDesign as is would be far more computationally expensive and has not been successfully implemented or experimentally validated. Thirdly, the method can still take advantage of the current built-in conditional generation functionalities of RFdiffusion, such as epitope-specific targeting and âmotifâ scaffolding. Lastly, the method should be directly transferable to other current and future RoseTTAFold-based design networks, such as RFdiffusion All-Atom20, for incorporating nonpeptidic molecules (nucleic acids, ions, etc.) during design calculations.
a, Cyclically symmetric relative position encoding enables the generation of macrocyclic peptide backbones with N and C termini linked by a peptide bond. The relative position encodings are cyclized by switching from positive relative position encodings (that is, to the right) to negative encodings (that is, to the left) when index j is more than halfway around the peptide relative to index i. b, RFpeptides produces diverse and designable cyclic peptides. Left: structural clusters calculated using t-distributed stochastic neighbor embedding (tSNE)31,32 to reduce the dimensionality of an all-by-all TMscore matrix computed with TMalign33 on an unfiltered set of 1,200 macrocycles generated using RFpeptides. Right: six RFpeptides outputs from differing structural clusters, all with <1 à backbone r.m.s.d. between the design model (blue) and the structure predicted by AfCycDesign (gold). comp., component. c, Self-consistency of designed macrocycles of various lengths. For each peptide length, the fraction (with nâ=â200 per length) of backbones with at least one of the of eight LigandMPNN34 sequences predicted by AfCycDesign to refold with pLDDTâ>â0.8 and within 2.0 à backbone r.m.s.d. of the designed structure. Success rates for all sampled backbones are in blue and success rates only counting unique structural clusters (as calculated using MaxCluster35,36 at a TMscore threshold of 0.5) are in orange. d, For multichain diffusion trajectories (for example, macrocycle binder design), the relative positional encoding for the macrocycle chain is cyclized, whereas interchain and target chain relative positional encoding is kept as standard. e, Pipeline for the design of protein-binding macrocycles using RFpeptides. Macrocycle backbones are generated from randomly initialized atoms by a stepwise RFdiffusion-based denoising process, followed by amino acid sequence design using ProteinMPNN. Design models are downselected on the basis of the computational metrics from structure prediction using AfCycDesign and physics-based interface quality metrics using Rosetta. f, RFpeptides generates diverse macrocycles against selected targets. Four diverse cyclic peptide binders against the same target were generated using RFpeptides, with AfCycDesign iPAEâ<â0.3 and Cα r.m.s.d.â<â1.5âà between the design model (blue) and AfCycDesign prediction (gold).
We modified the RFdiffusion protein binder design pipeline to use cyclic relative position encodings for the generated chain and standard positional encodings for the target and interbinder target indices (Fig. 1d). We then completed our design pipeline by using ProteinMPNN21 to design amino acid sequences compatible with the backbones generated by RFdiffusion (Fig. 1e). We chose ProteinMPNN for its improved performance in sequence design and ability to generate sequences with better solubility profiles over the sequences generated by traditional physics-based methods22. This pipeline readily generated macrocycles with diverse secondary structure content against target proteins (Fig. 1f) and the inclusion of standard RFdiffusion hotspot features clearly shifted the distribution of generated binders toward desired residues (Supplementary Fig. 3). We refer to this integrated pipeline as âRFpeptidesâ throughout the remainder of the text.
De novo design of macrocyclic binders to myeloid cell leukemia 1 and MDM2
We selected myeloid cell leukemia 1 (MCL1) as our first target protein, given the availability of multiple high-resolution X-ray crystal structures available to initiate the design calculations. MCL1 is also a promising target for anticancer therapeutics because of its roles in autophagy, cell survival, DNA repair and cellular proliferation23. For targeting MCL1, we used RFpeptides to generate 9,965 diverse cyclic peptide backbones, followed by four iterative rounds of ProteinMPNN and Rosetta Relax to design four amino acid sequences for each generated backbone. We expected the local changes to the generated backbone during the Rosetta Relax steps to allow for improved amino acid sequence diversity from the ProteinMPNN steps. While there are other ways to achieve increased sequence diversity, including generating multiple sequences per backbone from ProteinMPNN or adding noise during ProteinMPNN sequence generation, we did not explicitly try or compare them in this study. For downselecting the design candidates for experimental testing, we used AfCycDesign to repredict the designed macrocycleâtarget complexes from the macrocycle sequence and the target structure as a template. We selected the designs on the basis of the confidence metric (interface predicted aligned error (iPAE)) and the similarity between the original design model and the proteinâmacrocycle complex predicted by the AfCycDesign (Supplementary Fig. 4). For further stringency in the design selection process, we also used RF2 to repredict the complex structures, reasoning that the design models predicted identically by two orthogonal structure prediction networks (AfCycDesign and RF2) should have a higher likelihood of binding to the target as designed. However, the 1,984 selected designs at this stage were still more than the number of designs we could reasonably synthesize and test experimentally. Therefore, we next used Rosetta24 to calculate the âphysics-basedâ metrics of interface and macrocycle quality, such as calculated binding affinity (ddG), spatial aggregation propensity (SAP) of the designed macrocycle and the molecular surface area of the interface contacts (CMS) (Supplementary Fig. 4).
After strictly filtering the designed candidates on DL-based and physics-based metrics, we selected 27 designs for synthesis, biochemical and biophysical characterization. Despite specifying no hotspots to guide the generation process to a specific patch on the MCL1 structure, all selected designs bound to the functionally relevant MCL1âBH3 interaction site (Supplementary Fig. 5). While all selected designs include an α-helical segment, they feature different sequences, macrocycle placement and target interactions (Supplementary Fig. 5 and Supplementary Table 1a). In addition to the common helical motifs, the loop regions of the selected macrocycles also contribute extensive side-chain-mediated and backbone-mediated interactions to the binding interface. During the chemical synthesis using Fmoc-based solid-phase synthesis (Methods), the yields for the correctly cyclized product for 13 designs were low and insufficient for further characterization. We tested the remaining 14 macrocycles for binding to biotinylated MCL1 using surface plasmon resonance (SPR) single-cycle kinetics experiments (Supplementary Fig. 6). Three macrocycles showed binding to the MCL1, with the best binder, MCB_D2 (MCL1 binding design 2) (Fig. 2a), demonstrating a binding affinity of 2âµM (Fig. 2b). To confirm whether the designed macrocycle adopts the designed structure and engages MCL1 in the designed binding mode, we determined the X-ray crystal structure of MCB_D2 bound to MCL1 at 2.1 à resolution. The crystal structure was nearly identical to the design model, with a root-mean-square deviation (r.m.s.d.) of 0.7âà over all of the Cα atoms of the macrocycle with target chains aligned (Fig. 2c) and Cα r.m.s.d. of 0.4âà within the macrocycles when aligned (Fig. 2d). The side-chain rotamers of the interacting residues in the crystal structure also closely matched the design model (Fig. 2d). The crystal structure also confirmed that the binding interactions are not restricted to the helix region of the designed macrocycle but are also contributed by the loop regions (Fig. 2e,f). While several hydrophobic interactions from the MCB_D2 helical segment are similar to those seen in natural MCL1 binders (for example, BH3 peptide), (Supplementary Fig. 7), the N-to-C orientation of the helix is flipped in the case of MCB_D2. The loop region of MCB_D2 makes additional hydrophobic contacts and a cationâÏ interaction with MCL1 (Fig. 2e and Supplementary Fig. 7d) that we did not observe in previously reported natural MCL1 binders and their analogs. All three hits with an observable binding signal at 100âµM featured this cationâÏ interaction.
a, AfCycDesign prediction of MCB_D2 (purple) bound to MCL1 (gray surface). MCB_D2 side chains are shown as sticks. b, Affinity determination of MCB_D2 using SPR. SPR sensorgram from a nine-point single-cycle kinetics experiment (twofold dilution, highest concentration: 20âµM). Experimental data are shown in purple and global fits are shown with black lines. The Kd is also shown on the plot. c, Experimentally determined complex structures closely match the design model. Overlap of the X-ray crystal structure (gold and gray) with the design model for MCB_D2 (purple). The Cα r.m.s.d. for the macrocycle is 0.7âà when the experimental structure and design models are aligned by MCL1 residues. Close-up views demonstrate strong agreement between the side-chain rotamers of the design model and the X-ray structure. d, Overlay of the macrocycle model to the crystal structure shows a Cα r.m.s.d. of 0.4âà with nearly identical backbones and side-chain rotamers. e, Close-up view of the macrocycle-bound MCL1 structure showing the cationâÏ interaction at the interface. f, Close-up view of the macrocycle-bound MCL1 structure showing the hydrophobic contacts at the interface. g, AfCycDesign prediction of MDB_D8 design (blue) in complex with MDM2 (gray) shown as cartoons with interacting side chains shown as sticks, bound to MDM2 shown as surface. h, Affinity determination of MDB_D8 using SPR. SPR sensorgram from a nine-point single-cycle kinetics experiment (fivefold dilution, highest concentration: 50âµM). Experimental data are shown in blue and global fits are shown with black lines. The Kd is also shown on the plots. i, Overall and close-up views of the AfCycDesign prediction of the MDB_D8 design model, highlighting key interactions with the MDM2.
Encouraged by the experimental validation of the MCL1 binding macrocycles, we next sought to design binders to MDM2, an E3 ligase that interacts with tumor suppressor protein p53 and has multiple critical roles in tumor growth and survival25. We generated 10,000 macrocycle backbones spanning diverse lengths amenable to chemical synthesis (16â18 residues) and designed four amino acid sequences for each generated backbone using iterative rounds of ProteinMPNN and Rosetta Relax protocols (Methods). Design models were filtered on the basis of the confidence metrics and similarity of the AfCycDesign predictions to the designed complexes and the interface quality metrics calculated using Rosetta (Supplementary Fig. 4). AfCycDesign predicted 7,495 of the 40,000 design models to bind MDM2 with high confidence (normalized iPAEâ<â0.3) (Supplementary Fig. 4). In contrast to our approach for MCL1, we chose not to do any additional filtering with RF2 as the results between AfCycDesign and RF2 were fairly consistent. We also adjusted the filter thresholds for in silico filters as their overall distribution differed substantially from the distribution observed for MCL1 (Supplementary Fig. 4). After filtering on interface metrics (Methods), we identified 17 designs with iPAEâ<â0.3, ddGâ<ââ50âkcalâmolâ1, CMSâ>â300âà 2 and SAPâ<â35. We selected 11 top-ranked designs by ddG for biochemical and biophysical characterization. The 11 selected designs had diverse sizes, shapes and sequences (Supplementary Fig. 8 and Supplementary Table 2); however, they were all predicted to bind the same site as the p53 transactivation domain (Supplementary Fig. 8). Three of the selected designs had poor yields during the cyclization step of the chemical synthesis, preventing further experimental characterization with them. We tested the remaining eight peptides for binding to the biotinylated MDM2 by SPR and identified three binders with observable binding signals at 100âµM (Supplementary Fig. 9). The best design, MDB_D8 (Fig. 2g), demonstrated a binding affinity of 1.9âµM in the SPR single-cycle kinetics experiment (Fig. 2h). The computational model for this design makes several key contacts at the interface that are similar to interactions observed in native MDM2âp53 complex structures (Fig. 2i and Supplementary Fig. 10)25. Despite different overall structures, all three hits from the SPR screen had a similar binding motif composed of phenylalanine, tryptophan and either leucine or methionine from the helical segment of the macrocycle. Together, these data highlight the promising accuracy of the RFpeptides pipeline to design diverse macrocyclic binders for selected targets of interest.
De novo design of macrocyclic binders to γ-aminobutyric acid type A receptor-associated protein
We next set out to design binders against a target with a binding site that is structurally different from MCL1 and MDM2, formed by a mix of α-helices and β-strands (in contrast to all α-helical pockets of MCL1 and MDM2). We selected γ-aminobutyric acid type A receptor-associated (GABARAP) as the target, a protein responsible for mediating autophagy through its role in autophagosome biogenesis and recruitment of cargo, resulting in lysosomal degradation of damaged or surplus proteins and organelles26. Peptide modulators against GABARAP could have therapeutic applications in the treatment of late-stage cancers27 or as chimeric peptides for autophagy-mediated targeted protein degradation28. Our target binding site for GABARAP, which is also the binding site for the native LC3-interacting region or Atg8-interacting motif29, is formed by a mix of β-strand and α-helix secondary structures (Fig. 3 and Supplementary Fig. 13). For designing macrocyclic binders against the human GABARAP, we used a similar pipeline as described above for MCL1 and MDM2 (Methods) but we doubled the number of generated designs and defined six hotspot residues (Lys46, Lys48, Tyr49, Leu50, Phe60 and Leu63) to guide the macrocycle backbone generation to a specific site on the target (Fig. 3a,d). We generated 20,000 macrocycle backbones and designed the amino acid sequences using ProteinMPNN and Rosetta Relax protocols. Of the resulting 80,000 design models, we selected 335 macrocyclic designs on the basis of AfCycDesign (iPAEâ<â0.13) and Rosetta (ddGâ<ââ30âkcalâmolâ1, SAPâ<â35 and CMSâ>â300âà 2) interface metrics (Supplementary Fig. 4). Instead of trying to synthesize and characterize all 335 cyclic peptides (which would have required substantial time and experimental resources), we clustered the 335 designs into 80 different clusters on the basis of their three-dimensional structures and selected representative designs from diverse clusters for further biochemical characterization. We selected 13 diverse macrocycles of 12â17 residues for synthesis and experimental validation (Supplementary Table 3 and Supplementary Fig. 11). Unlike the design candidates described above for MCL1 and MDM2, several of the selected macrocycles for GABARAP showed cyclic β-sheet structures with several edgeâstrand interactions with the target (Supplementary Fig. 11).
a, AfCycDesign predicted model for design GAB_D8 bound to GABARAP shown as surface, with hotspot residues highlighted in green. b, Affinity determination of GAB_D8 using SPR. SPR sensorgram from a nine-point single-cycle kinetics experiment (fivefold dilution, highest concentration: 20âµM). Experimental data are shown in orange and global fits are shown with black lines. The Kd is also shown on the plot. c, Superposition of chains E and F from the X-ray crystal structure of GAB_D8 bound to GABARAPL1 and the AfCycDesign model. d, AfCycDesign predicted model for design GAB_D23 bound to GABARAP shown as surface, with hotspot residues highlighted in green. e, Affinity determination of GAB_D23 using SPR. SPR sensorgram from a nine-point single-cycle kinetics experiment (fivefold dilution, highest concentration: 20âµM). Experimental data are shown in pink and global fits are shown with black lines. The Kd is also shown on the plot. f, Alignment of chains A and B from the X-ray crystal structure of GAB_D23 bound to GABARAP and the AfCycDesign model. g, Alignments of GAB_D8 and GAB_D23 macrocycle models to X-ray crystal structures show close matches. h, Comparison of GAB_D8 and GAB_D23 binding modes in the design models. i, Competitive AlphaScreen dose-response plot, IC50 from the average of three experiments. Donor and acceptor beads in the assay are bound to GABARAP and GABARAP-binding peptide K1, respectively.
We successfully synthesized six designs with high purity (>90%) and tested them for binding to GABARAP using SPR (Supplementary Fig. 12). Two designs, GAB_D8 and GAB_D23, showed binding affinities of 6ânM and 36ânM, respectively (Fig. 3b,e). To further characterize the binding of GAB_D8 and GAB_D23, we tested the ability of these designs to disrupt the interaction of GABARAP with linear peptide K1 (a previously described binder to this site30) in AlphaScreen assays. GAB_D8 and GAB_D23 demonstrated a half-maximal inhibitory concentration (IC50) of 0.7ânM and 2.5ânM in the AlphaScreen assay, respectively (Fig. 3i). To our knowledge, GAB_D8 is the most potent macrocyclic GABARAP binder to date.
In crystallization trials, we did not obtain crystals of sufficiently high quality for GAB_D8 bound to GABARAP. We instead crystallized GAB_D8 bound to GABARAPL1, a homolog of GABARAP with 86% overall sequence identity and 100% sequence identity for residues within 5âà of GAB_D8 in the design model. The X-ray crystal structure for GAB_D8 bound to GABARAPL1 matched very closely with the design model, with a Cα r.m.s.d. of 1.2âà over the macrocycle when aligned by the target protein to the closest of the four copies in the asymmetric unit (Fig. 3c and Supplementary Fig. 13) and a Cα r.m.s.d. of 0.47âà when aligned by macrocycle alone (Fig. 3g). Notably, the X-ray structure of the GAB_D8âGABARAPL1 complex showed two different bound conformations of GAB_D8, one that closely matched the design model and a second one that partially deviated from the design model (Supplementary Fig. 14), with a register shift nucleated by Thr10 from the macrocycle forming main-chain-mediated and side-chain-mediated hydrogen bonds with Lys48 on the target. GAB_D23 crystallized readily with GABARAP and also closely matched the design model with a Cα r.m.s.d. of 1.7âà when aligned by the target (Fig. 3f) and Cα r.m.s.d. of 0.74âà across the macrocycle alone (Fig. 3g). The X-ray crystal structure confirmed the key designed interactions, such as Trp5 and Ile8, with the main difference between the design model and the X-ray structure being the switch from a type I β-turn from Leu1 to Gly4 in the design model to a less regular conformation in the crystal structure, with a tendency for a type IⲠβ-turn from Glu2 to Trp5. While our original design models were predicted with single sequences as inputs to AF2, we retrospectively predicted the GAB_D8âGABARAPL1 and GAB_D23âGABARAP complex structures with multiple-sequence alignment (MSA) inputs. These MSA-based predictions of the designs matched even more closely with the X-ray crystal structures, with a Cα r.m.s.d. of 0.5âà and 0.9âà for the GAB_D8âGABARAPL1 and GAB_D23âGABARAP complexes, respectively, when aligned by the target structure (Supplementary Fig. 15). Overall, these data demonstrate the ability of our de novo design pipeline to identify high-affinity binders against targets with diverse pocket shapes and surfaces without requiring library-scale screening.
Design of macrocyclic binders to predicted structures
Given the high accuracy and binding affinity of macrocycles designed against selected targets, we next set out to design macrocyclic binders against targets without any experimentally determined structure. We reasoned that the high accuracy of RFpeptides could mitigate the inherent risk of designing against a predicted target structure. We designed macrocycles against Rhombotarget A (RbtA), a recently identified cell surface protein from the ESKAPE pathogen, Acinetobacter baumannii. There are no experimentally determined structures available for this protein and sequence-based searches against the Protein Data Bank (PDB) did not return notable matches to other protein structures. We predicted the structure of the 617-aa full-length protein using AF2 and RF2; both methods predicted similar overall structures (Cα r.m.s.d. of 0.4âà over 509 residues excluding the signal peptide and transmembrane domain) with high confidence (predicted local distance difference test (pLDDT)â>â90) (Supplementary Fig. 16). AF2 and RF2 both predicted two distinct extracellular domains: an N-terminal β-helix domain and a C-terminal Ig-like domain (Supplementary Fig. 16). While there were some differences in the predicted structures from AF2 and RF2, we decided to focus our binder design calculations on regions that were predicted nearly identically and with high confidence by AF2 and RF2. On the basis of our preliminary design runs without hotspots to guide the diffusion, we identified a patch in the N-terminal domain to pursue in our large-scale design calculations against this target and defined hotspots Leu144, Phe202, Phe204, Tyr206, Val208, Leu231 and Ala269 for peptide backbone generation (Fig. 4a). In contrast to the concave pockets targeted for MDM2 and MCL1, this selected patch for RbtA is considerably flatter and difficult to target with conventional computational and experimental approaches (Supplementary Fig. 17). We generated 20,000 backbones for macrocycle binders and designed four amino acid sequences for each backbone using iterative rounds of ProteinMPNN and Rosetta Relax. Designs were filtered using AfCycDesign confidence metrics and Rosetta interface metrics, as described in earlier sections (Supplementary Fig. 4). On the basis of these in silico metrics, we selected 26 designs for biochemical and structural characterization with AfCycDesign iPAEâ<â0.4, ddGâ<ââ30âkcalâmolâ1, r.m.s.d. between the design model and AfCycDesign predictionâ<â1.5 à and CMSâ>â300âà 2 (Supplementary Fig. 18). The selected designs covered diverse sizes (13â18âaa), sequences, shapes and secondary structures (Supplementary Fig. 18 and Supplementary Table 4). We expressed the Avi-tagged version of the RbtA N-terminal domain (residues 20â458) and used it for binding screens using SPR. Four of 11 designs that were synthesized in sufficient quantity and purity showed a binding signal at 100âµM in our screens (Supplementary Fig. 19). On the basis of further binding experiments with SPR, we determined the dissociation constant (Kd) of the best binder, RBB_D10, to be 9.4ânM (Fig. 4b). The design model for RBB_D10 showed extensive contacts to the target with several side-chain-mediated polar contacts and hydrophobic interactions (Fig. 4fâh).
a, AfCycDesign prediction of design RBB_D10 (violet cartoon) bound to the AF2-predicted β-helix domain of RbtA shown as gray surface. Hotspot residues from RbtA used during the backbone design step are shown in green. b, SPR sensorgram from nine-point single-cycle kinetics experiment (fivefold dilution, highest concentration: 20âµM). The Kd determined from the SPR experiment is also denoted on the plot. c, Close agreement of the RF2-predicted structure of RbtA (gray) with the X-ray structure (gold) of the RbtA N-terminal domain determined here confirms the predicted structure of the target used for the macrocycle design calculations. d, Alignment of the design model of RbtA-bound RBB_D10 (violet and gray) to the X-ray structure (gold) shows a close match between the design model and the experimentally determined structure (CÉ r.m.s.d. for macrocycle: 1.4âà ). Close-up view of the RbtA-bound RBB_D10 with side chains shown as sticks. e, Overlay of RBB_D10 design model (after the AfCycDesign prediction step) aligned to the X-ray structure without RbtA demonstrates a nearly identical match for backbone coordinates and side-chain rotamers (CÉ r.m.s.d.: 0.4âà ). The design model and X-ray structure are shown in violet and gold, respectively. f, Close-up view of the macrocycle-bound RbtA structure and design model showing polar side chain-to-backbone interactions mediated by RBB_D10 residue Asn12 at the interface. g, Close-up view of the polar side chain-to-side chain interactions mediated by RBB_D10 residue Asp6 at the interface. h, Close-up view of the hydrophobic interactions between RbtA and RBB_D10 at the binding interface.
To confirm the structures of RbtA and RBB_D10 and the binding mode between them, we determined the high-resolution X-ray crystal structure of apo and macrocycle-bound RbtA using X-ray crystallography at 2 à and 2.6 à resolution, respectively. The apo structure of the RbtA N-terminal domain, which is also the first experimentally determined structure from this class of bacterial proteins, matched our AF2 and RF2 predictions for this target very closely, with an overall CÉ r.m.s.d. of 1.2âà and 1.1âà between the X-ray structure of the RbtA N-terminal domain and the AF2-predicted and RF2-predicted structures, respectively (Fig. 4c). The complex structure also confirmed the structure and binding mode of our designed macrocycle, RBB_D10, with the X-ray structure matching the design model with an r.m.s.d. of 1.4âà (Fig. 4d). Notably, the conformation adopted by the macrocycle in the X-ray structure, including the side-chain rotamers involved in interactions with the target, was almost identical to the design model with an r.m.s.d. of 0.4âà (Fig. 4eâh). Together, these data highlight the high accuracy and success rates provided by RFpeptides even while designing macrocycles against targets without deep pockets or targets with no known structures.
Overall, these data show that RFpeptides can sample extensive structural and chemical diversity of macrocycles during the backbone and sequence generation steps against selected targets and, finally, select the shapes and sequences ideally suited for binding the target surface or pockets. The highest-affinity binders against each target are also predicted to fold into the bound conformations even in the absence of the target (Supplementary Fig. 20), suggesting that macrocycles are designed to fold into binding-competent conformations. For all four design campaigns described here, selected designs demonstrated good solubility in aqueous buffers despite not imposing any particular sequence constraints related to solubility during the sequence design step using ProteinMPNN21. Notably, combining DL-based and physics-based in silico filters helps to select medium to high-affinity binders. However, we note that the distribution of such metrics varies substantially across the four selected targets and adjustments to filtering thresholds were required on the basis of the shape and chemical composition of the target pocket. While in silico metrics enrich well for binders, the relative ranking within the selected designs does not perfectly match the experimental binding affinities. The highest-affinity binders for MDM2 and RbtA had the best or second-best iPAE values among the designs chosen for those targets (Supplementary Tables 2b and 4b); however, the hit peptides against MCL1 and GABARAP were not among the top three ranked designs (Supplementary Tables 1b and 3b). Integration with high-throughput methods in the future should enable testing of more designs and inform absolute threshold values and filtering schemes for the single-shot design of peptide binders to any arbitrary target.
Discussion
Here, we describe RFpeptides, a generative DL pipeline for precise de novo design of macrocycle binders against a wide range of protein targets. The power of the approach is highlighted by the high affinities (Kdâ<â10ânM) of the designed macrocyclic binders to GABARAP and RbtA and the nearly identical X-ray crystal structures and design models of the macrocycle-bound MCL1, GABARAP and RbtA (Cα r.m.s.d. of 0.7âà , 1.2âà and 1.4âà , respectively). The RFpeptides approach offers several advantages over traditional methods. Firstly, the design approach should enable faster and more efficient discovery of macrocyclic binders. Despite testing fewer than 20 designed candidates per target (in contrast to trillions of peptides tested in traditional library-based approaches), we achieved high-affinity binders for two targets without requiring any further experimental optimization; to our knowledge, this is a considerably higher success rate than achieved with any previous method. Secondly, in contrast to the untargeted nature of the random library-based approaches, RFpeptides can be used for designing custom binders to specific patches and sites, as demonstrated for GABARAP and RbtA. Lastly, the atomically accurate nature of the design models enables structure-guided optimization for properties beyond target binding (as well as further increases in affinity), bypassing the bottleneck of complex structure determination, which has hindered the optimization of leads from library screening. Combined with the design principles for membrane traversal, RFpeptides could enable the design of peptides simultaneously optimized for target binding and cell permeability or oral bioavailability.
RFpeptides also has considerable advantages over previous computational peptide design methods. Information on known ligands and/or binding partners is not required to initiate design. RFpeptides can design macrocycles completely de novo from just the structure or sequence (as in the case of RbtA) of the target, enabling design against molecular targets intractable with previous methods. RFpeptides is not limited to generating macrocycles with particular motifs or topologies; the diffusion process generates macrocycles with diverse shapes and sizes and selects the topologies appropriate for the protein being targeted. Among the four targets tested here, binders for MCL1 and MDM2 have helical motifs, binders for GABARAP have a β-sheet topology and binders for RbtA sample looplike conformations that make extensive contacts with the flat surface of this target.
We anticipate that RFpeptides will enable the rapid design of custom macrocyclic binders against a wide range of molecular targets, accelerating efforts to develop peptides for diverse functional applications. With the rapid advances in DL methods and frameworks, including the recent development of all-atom diffusion models, we aim to extend the approach to generative design of macrocycles with noncanonical amino acids, crosslinkers and cyclization chemistries.
Methods
Computational methods for cyclic peptide binder design
Macrocyclic peptide monomers and binders were designed with RFpeptides using a three-stage pipeline: backbone generation using RFdiffusion with the cyclic offset applied to the peptide chains, followed by sequence design using ProteinMPNN and, finally, structure prediction of the designed peptideâtarget complexes using either AfCycDesign and/or RoseTTAFold with the cyclic offset applied to the peptide. Designs were further filtered and downselected using Rosetta metrics and, in some cases, clustered on the basis of Cα r.m.s.d. Detailed computational methods, including example scripts, can be found in Supplementary Section 2.2.
Peptide synthesis
Macrocyclic peptides described here were either purchased from Wuxi AppTec at greater than 90% purity or synthesized in-house using Fmoc-based solid-phase peptide synthesis. Peptides were typically synthesized on preloaded CTC resin. The resin was swollen in DCM followed by iterative deprotection with 20% piperidine in DMF and coupling with either HBTU (Sigma) or PyAOP (Novabiochem) and DIEA (Sigma). The linear peptides were cleaved from the resin using either 2% TFA in DCM or 20% HFIP (Oakwood Chemical) in DCM. The solvent was removed by rotary evaporation and linear protected peptides were cyclized in either DCM, DMF or a mixture of both depending on the solubility of the peptide, using two equivalents of PyAOP and five equivalents of DIEA overnight. The protecting groups were removed using a cocktail of 95:2.5:2.5, TFA, water and TIPS for 2.5âh. The crude peptides were precipitated using cold diethyl ether. The precipitate containing the crude cyclization reaction was dissolved in a mixture of water and acetonitrile for purification using reverse-phase high-performance liquid chromatography (LC). Peptide identities were confirmed by mass spectrometry (MS). Purities for all synthesized and tested macrocyclic peptides are also summarized in Supplementary Tables 7â10. The mass spectrograms and analytical LC chromatograms for all purified peptides are shown in Supplementary Section 4.
Protein expression and purification
MDM2 and MCL1
The amino acid sequences of MCL1 (PDB 2PQK)37 and MDM2 (PDB 4HFZ)38 were retrieved from the PDB. The optimized genes were then cloned into a Novogen pRSF-DUET plasmid (Sigma, 71341-3), incorporating a 6xHis-tag at the N terminus, followed by an Avi-tag and a tobacco etch virus (TEV) protease cleavage site. The resulting constructs were codon-optimized for Escherichia coli expression and synthesized by Genscript. For propagation, the plasmids were transformed into E.âcoli NEBα cells (New England Biolabs, C2987); for protein expression, the plasmids were transformed into E.âcoli BL21(DE3) cells (New England Biolabs, C2527). A single sequence-verified colony was cultured in 50âml of kanamycin (50âµgâmlâ1) selective Luria Broth (LB) medium. This culture was incubated at 37â°C with shaking at 200ârpm for 16âh overnight. Subsequently, 50 units of optical density at 600ânm (OD600) of the overnight culture were transferred to 1âL of fresh kanamycin (50âµgâmlâ1) selective LB medium. The culture was grown at 37â°C with shaking at 200ârpm for 2âh (until it reached an OD600 of 0.4â0.5), at which point the temperature was decreased to 20â°C. The culture was grown until an OD600 of 0.7â0.8; protein expression was induced by adding 1âmM IPTG and the culture was left to grow overnight for 14âh.
Cells were harvested by centrifugation at 5,000g for 10âmin at 4â°C, resulting in a cell pellet with a density of 5âgâLâ1. The pellet was immediately flash-frozen and stored at â20â°C for later use. For lysis, the pellet was thawed on ice and resuspended in 5âml of lysis buffer per gram of pellet. This lysis buffer contained 50âmM Tris-HCl, 300âmM NaCl and 10âmM imidazole and was supplemented with 1à BugBuster protein extraction reagent (Sigma-Aldrich, 70921), 200âµgâmlâ1 lysozyme (Sigma-Aldrich, L6876), 25âU per ml benzonase nuclease (Sigma-Aldrich, E8263) and 1à cOmplete EDTA-free protease inhibitor cocktail (Sigma-Aldrich, 11836170001). The buffer was filter-sterilized using a 0.2 µm filter before the addition of benzonase, mixed by inversion and kept on ice until use. Cells were completely resuspended in the lysis buffer using a homogenizer at low speed and incubated for 30âmin at room temperature (22â25â°C). Following incubation, the suspension was sonicated using a Q500 Sonicator equipped with a four-tip probe. Sonication was conducted for 2â3âmin using pulses of 10â15âs on followed by 10â15âs off at 70% amplitude. The lysate was clarified by centrifugation at 16,000g for 20âmin.
Ni-NTA agarose resin (Qiagen, 30210) was equilibrated with 20 column volumes (CV) of ultrapure water, followed by 20 CV of equilibration buffer (50âmM Tris-HCl, 300âmM NaCl and 10âmM imidazole). Then, 4âml of 50% resin suspended in equilibration buffer was used to bind His-tagged proteins from 25âml of clarified lysate. All immobilized metal affinity chromatography (IMAC) steps were conducted at 4â°C. The lysateâresin mixture was incubated for 60âmin on a rotary shaker set to a slow speed. After incubation, the resin was transferred to a 20-ml gravity column and allowed to completely settle. The resin was first washed with 20 CV of wash buffer 1 (20âmM Tris-HCl, 250âmM NaCl, 10âmM imidazole and 5âmM β-mercaptoethanol), followed by another 20 CV of wash buffer 2 (20âmM Tris-HCl, 500âmM NaCl and 35âmM imidazole). The bound proteins were then eluted with 8âml of elution buffer (20âmM Tris-HCl, 250âmM NaCl, 350âmM imidazole and 2âmM DTT). Aliquots of the eluate were collected and analyzed using SDSâPAGE gels.
The eluate was loaded onto a pre-equilibrated Superdex 75 10/300 GL column (25âmM Tris-HCl, 250âmM NaCl and 2âmM DTT) and run at a flow rate of 0.6âmlâminâ1 using an ÃKTA pure system for size-exclusion chromatography (SEC). Then, 1 ml fractions were collected from the elution volume of 8â16âml and those corresponding to peaks in the absorbance at 280ânm between an elution volume of 10 and 13âml were assessed with SDSâPAGE gels. Fractions confirming the expected molecular weight were pooled and concentrated by centrifugation at 4,000g for 30âmin at 4â°C using Amicon Ultra-4 concentrators with a 3 kDa cutoff (Millipore Sigma, UFC800308) to a final volume of 500âµl. The identity of the eluted proteins were confirmed by MS using an Agilent 6230 LCâMS time-of-flight system.
Verified protein samples were processed for further applications: biotinylation for SPR analysis or tag removal by TEV protease cleavage for crystallography. Biotinylation was performed using the BirA biotinâprotein ligase standard reaction kit (Avidity, BirA-500) according to the manufacturerâs recommended conditions. The reaction was carried out at 4â°C overnight on a slowly shaking platform. For TEV protease cleavage, the proteins were treated with a 25:1 protein to TEVd enzyme ratio39. Similarly, the mixture was incubated at 4â°C overnight on a slowly shaking platform. Following these treatments, samples underwent a cleanup step using 1âml of Ni-NTA resin per 20âmg of protein. The resin was pre-equilibrated with 10 CV of ultrapure water and 10 CV of a buffer containing 25âmM Tris-HCl, 250âmM NaCl and 10âmM imidazole. The pre-equilibrated resin was added to the protein mixture and incubated for 30âmin on a rolling platform at 4â°C. Subsequently, the mixtures were filtered through a 0.45 µm PVDF centrifugal filtering unit to remove the Ni-NTA-bound substrates. The eluate was collected and dialyzed in 2âL of 25âmM Tris-HCl, 250âmM NaCl and 2âmM DTT using a Slide-A-Lyzer G3 dialysis cassettes with a 3.5 kDa molecular weight cutoff (Thermo Scientific, A52966) overnight for 18âh at 4â°C stirring. The dialyzed protein was concentrated to 0.2â0.5âml (as required for downstream assays), using the Amicon ultra concentrators (as above), aliquoted and flash-frozen. Fractions were analyzed by mass spectroscopy for the efficacy of the biotinylation and TEV protease cleavage treatments, as previously described.
GABARAP for SPR
A synthetic complementary DNA was designed on the basis of the amino acid sequence of GABARAP (UniProt O95166) and optimized for expression in E.âcoli using Benchling software. The construct was devised to include an N-terminal Avi-tag and TEV protease cleavage site and was cloned into the Novogen pET-50b(+) plasmid. This plasmid configuration introduced a tandem arrangement of protein tags at the N terminus: a 6xHis-tag, followed by a NusA solubility tag, another 6xHis-tag and a human rhinovirus (HRV) 3C protease cleavage site. Therefore, the final construct sequence was as follows: 6xHisâNusAâ6xHisâHRV 3CâAviâTEVâGABARAP. NusA was specifically chosen as a solubility tag because of its known effectiveness in enhancing protein solubility in E.âcoli40,41. The construct was synthesized and cloned by Genscript.
As described above for MCL1 and MDM2 protein expression, the plasmids were introduced into E.âcoli NEBα cells and BL21(DE3) cells. A single sequence-verified colony was cultured in 50âml of kanamycin (50âµgâmlâ1) selective LB medium for 16âh at 37â°C, shaking at 200ârpm. Then, 50 OD600 units of this culture were transferred to 1âL of fresh kanamycin (100âµgâmlâ1) selective autoinduction medium (TBM-5052: 1.2% (w/v) tryptone, 2.4% (w/v) yeast extract, 0.5% (v/v) glycerol, 0.05% (w/v) d-glucose, 0.2% (w/v) d-lactose, 25âmM Na2HPO4, 25âmM KH2PO4, 50âmM NH4Cl, 5âmM Na2SO4, 2âmM MgSO4, 10âμM FeCl3, 4âμM CaCl2, 2âμM MnCl2, 2âμM ZnSO4, 400ânM CoCl2, 400ânM NiCl2, 400ânM CuCl2, 400ânM Na2MoO4, 400ânM Na2SeO3 and 400ânM H3BO3). The culture was grown at 37â°C with shaking at 200ârpm for 2âh, at which point the temperature was decreased to 22â°C and the culture was left to grow for 16âh.
Cells were harvested, lysed and purified following the protocol outlined earlier for MCL1 and MDM2, with some modifications. The cultures yielded a cell pellet amounting to 15âgâLâ1. Lysis was completed using an IKA T18 microfluidizer at 450âpsi, followed by lysate clarification by centrifugation at 16,000g for 15âmin. All IMAC steps were conducted at 22â°C, except for the incubation of the lysateâresin mixture, which was performed at 4â°C. Proteins bound to the resin were eluted with 5âml of elution buffer (50âmM Tris-HCl pH 8, 250âmM NaCl and 300âmM imidazole). SEC was then performed using a Superdex 200 Increase 10/300 GL column (Cytiva) equilibrated with TBS (50âmM Tris-HCl pH 8 and 250âmM NaCl). Fractions confirmed by SDSâPAGE were pooled and concentrated using Amicon Ultra-15 concentrators with a 30 kDa cutoff (Millipore Sigma, UFC9030) to a final volume of 1âml. Downstream processing for SPR analysis was performed as described previously, with one modification. For biotinylation, the protein was first cleaved using HRV 3C protease with the reagents and protocol provided by the Pierce HRV 3C protease solution kit (Thermo Scientific, 88946). The digested samples were subsequently purified and verified, as outlined in earlier sections.
GABARAP and GABARAPL1 for crystallography
GABARAP and GABARAPL1 were expressed as glutathione S-transferase (GST) fusion proteins after transforming E.âcoli BL21(DE3) T1 cells with pGEX4T2-GABARAP and pGEX4T2-GABARAPL1 plasmids, respectively. Bacteria were cultivated in LB medium containing 100âµgâmlâ1 ampicillin; gene expression was induced with 1âmM IPTG at an OD600 of 0.6â0.8 and allowed to proceed for 20âh at 25â°C. Afterward, cells were harvested by centrifugation at 3,000g for 30âmin at 4â°C. The bacterial pellet was washed with PBS (137âmM NaCl, 2.7âmM KCl, 1.8âmM KH2PO4 and 10âmM Na2HPO4) and resuspended in lysis buffer (PBS supplemented with 5% (v/v) glycerol, 0.01% (v/v) β-mercaptoethanol, 10âµgâmlâ1 DNase (AppliChem, A3778) and cOmplete EDTA-free protease inhibitor cocktail (Roche, 11836170001)) before application to the cell disruptor (Constant Systems, model TS1.1) for three cycles with 1.9âkbar at 4â°C. Lysates were cleared by centrifugation at 4â°C with 45,000g for 45âmin. The GST fusion proteins were purified from the supernatant by affinity chromatography using glutathione Sepharose 4B (Cytiva, 1705605). Cleavage with thrombin (Sigma-Aldrich, 1.12374) during dialysis against 10âmM Tris-HCl and 150âmM NaCl (pH 7.0) at 4â°C overnight yielded 119 amino acid proteins carrying an N-terminal Gly-Ser extension in addition to the native residues of GABARAP and GABARAPL1. Subsequently, samples were applied to a Hiload 26/60 Superdex 75 preparatory-grade size-exclusion column (GE Healthcare) equilibrated with 10âmM Tris-HCl and 150âmM NaCl (pH 7.0). Protein purity was assessed by SDSâPAGE and Coomassie staining. Fractions containing the eluted proteins were concentrated to 3â5âmgâmlâ1 using Vivaspin 20 concentrators with a 3 kDa cutoff (Sartorius), flash-frozen in liquid N2 and kept at â80â°C for long-term storage.
RbtA β-helix domain
For heterologous expression of the β-helix domain of RbtA (residues A20âI459) in E.âcoli, the gene was amplified and fused with a SNAC tag (GSHHWGS) at the C terminus using the following primers: forward, GCTGCCCAGCCGGCGATGGCCATGGGCGCTGATATTGAAGTCACAACTAC; reverse, CAGTGGTGGTGGTGGTGGTGCTCGAGGCTGCCCCAATGATGGCTGCCGATATATTCAATTGCGCCTAAAT42. The fragment was inserted into NcoI-digested and XhoI-digested pET-22b(+) by Gibson assembly to generate a construct with a C-terminal 6xHis fusion. The construct was confirmed by sequencing and transformed into E.âcoli Rosetta (DE3) cells.
To purify the β-helix domain of RbtA, an overnight culture of Rosetta (DE3) cells carrying the construct was back-diluted 1:300 in 2à YT broth and grown at 37â°C with shaking at 200ârpm until the OD600 reached 0.4. The incubation temperature was reduced to 18â°C, IPTG was added to a final concentration of 0.3âmM and the culture was incubated for a total of 18âh. Cells were then collected by centrifugation and resuspended in lysis buffer containing 200âmM NaCl, 50âmM Tris-HCl pH 7.5, 10% glycerol (v/v), 5âmM imidazole, 0.5âmgâmlâ1 lysozyme and 1âmU of benzonase. Cells were then lysed by sonication and cellular debris was removed by centrifugation at 35,000g for 30âmin at 4â°C. The protein was purified from lysates using a 1 ml HisTrap HP column on an ÃKTA fast protein LC (FPLC) system. Column-bound protein was eluted using a linear imidazole gradient from 5 to 500âmM. Protein purity was assessed by SDSâPAGE and Coomassie staining. The fractions with high purity were concentrated using a 30 kDa cutoff Amicon filter and then further purified by FPLC using a HiLoad 16/600 Superdex 200 preparatory-grade column (GE Healthcare) equilibrated with sizing buffer (500âmM NaCl, 50âmM Tris-HCl pH 7.5 and 10% glycerol (v/v)). The fractions with high purity were concentrated and used for evaluation of macrocyclic binders or determination of X-ray structure.
For determination of the X-ray crystal structure of RbtA, the C-terminal 6xHis-tag was removed by chemical cleavage at the SNAC tag. In brief, the buffer of the concentrated protein was exchanged to cleavage buffer (0.1âM CHES, 0.1âM NaCl, 0.1âM acetone oxime and 5âmM Fos-choline-12, pH 8.6). The protein solution was diluted to 1âmgâmlâ1, followed by the addition of 1âmM TCEP and 1âmM NiCl2. The mixture was vortexed and incubated at room temperature for 16âh. The precipitation was removed by centrifugation at 35,000g for 30âmin at 4â°C. The supernatant was concentrated and exchanged to Tris buffer (50âmM Tris-HCl pH 7.5 and 200âmM NaCl). The protein solution was incubated with a 1 ml bed volume of Ni-NTA beads to extract the cleaved 6xHis-tag. The resulting fraction was concentrated and then further purified by FPLC using a HiLoad 16/600 Superdex 200 preparatory-grade column.
Crystallization of proteinâcyclic peptide complexes
MCL1 with cyclic peptide
MCL1 (18.5âmgâmlâ1) and macrocycle MCB_D2 were mixed in 1:2 molar ratio and incubated for 30âmin at room temperature. Upon addition of the MCB_D2 to the protein, we observed some precipitation. This precipitant was removed by centrifugation before crystallographic screening. Crystallization experiments for the MCL1âMCB_D2 complex were conducted using the sitting-drop vapor diffusion method. Initial crystallization trials were set up in 200 nl drops using 96-well crystallization plates. Crystal drops were imaged using the UVEX crystal plate hotel system by JANSi. Diffraction-quality crystals for the complex appeared in 0.2âM sodium chloride, 0.1âM BisâTris pH 6.5 and 25% (w/v) polyethylene glycol 3350 (Hampton Research) in 2âweeks.
GABARAP and GABARAPL1 with cyclic peptides
Cyclic peptides GAB_D8 and GAB_D23 were dissolved in 10âmM Tris-HCl and 150âmM NaCl (pH 7.0) and each mixed with both GABARAP and GABARAPL1, targeting a peptide-to-protein molar ratio of 3:2. After incubation for 10âmin at room temperature, any insoluble components were removed by centrifugation (10âmin at 20,000g and 4â°C). The proteinâpeptide complexes were concentrated using Amicon Ultra-0.5 centrifugal filter units with a 3 kDa cutoff (Merck) until a final protein concentration of 6â8âmgâmlâ1 (GABARAPL1âGAB_D8) or 13â15âmgâmlâ1 (GABARAPâGAB_D23) was reached. Samples were once again cleared of particles by centrifugation (30âmin at 20,000g and 4â°C) before application in crystallization experiments. Search for crystallization conditions was performed by the sitting-drop vapor diffusion method using robotic systems Freedom Evo (Tecan) and Mosquito LCP (SPT Labtech) with commercially available screening sets. Experiments were set up by combining 200ânl of proteinâpeptide complex with 100ânl (for GABARAPL1âGAB_D8) or 200ânl (for GABARAPâGAB_D23) of reservoir solution and plates were incubated at 20â°C. Crystals appeared for a number of conditions, which were subjected to optimization as appropriate. Diffraction-quality samples used for X-ray structure determination developed with reservoir solutions containing 0.17âM ammonium sulfate, 25.5% (w/v) PEG 4000 and 15% (v/v) glycerol for GABARAPL1âGAB_D8 and 0.1âM MES pH 5.0 and 30% (w/v) PEG 6000 in the case of GABARAPâGAB_D23. Diffraction data (https://doi.esrf.fr/10.15151/ESRF-DC-1966164200 and https://doi.esrf.fr/10.15151/ESRF-DC-1979522808 for GABARAPL1âGAB_D8 and GABARAPâGAB_D23, respectively) were collected at 100âK on beamline BM07 of the European Synchrotron Radiation Facility (ESRF) tuned to an X-ray wavelength of 0.9795âà , using a Pilatus 6M detector (DECTRIS). Data processing was carried out with XDS and XSCALE43 and included reflections up to a diffraction limit of 1.5âà for GABARAPâGAB_D23 and 2.5âà for GABARAPL1âGAB_D8. The GABARAPâGAB_D23 structure featuring space group C2 was determined by molecular replacement (MR) using MOLREP44 with the structure of GABARAP from its K1 peptide complex (PDB 3D32)30 as a template. For the GABARAPL1âGAB_D8 complex, initial evaluation suggested tetragonal symmetry but with strong indications of twinning. Data integration in maximal translationengleiche subgroups followed by MR search using MoRDa45 revealed P212121 as the true space group, with near-perfect pseudomerohedral twinning accounting for apparent Laue group 4/mmm. To avoid bias in cross-validation, this pseudosymmetry of the data was explicitly accounted for in flag assignment. The solution obtained for GABARAPL1âGAB_D8 was subjected to a round of automated rebuilding in phenix.autobuild46. In either case, model refinement was performed with phenix.refine47, alternating with interactive rebuilding in Coot48, which included stepwise introduction of cyclic peptides GAB_D8 and GAB_D23. According to validation using MolProbity49 and the wwPDB validation system (https://validate-rcsb-2.wwpdb.org/), both models featured good geometry. Detailed statistics of data collection and refinement can be found in Supplementary Table 6.
RbtA with cyclic peptide and apo RbtA
RbtA (10âmgâmlâ1) and RBB_D10 were mixed in a 1:5 molar ratio and incubated for 30âmin at room temperature. Initial crystallization trials were set up in 200 nl drops using 96-well crystallization plates and the experiments were conducted by the sitting-drop vapor diffusion method. Crystal drops were imaged using the UVEX crystal plate hotel system by JANSi. Diffraction-quality crystals for the RbtAâRBB_D10 complex appeared in 0.2âM lithium sulfate, 0.1âM Tris pH 8.5 and 40% (v/v) PEG 400 (JCSG Plus, Hampton Research). Additionally, we soaked the crystals in 22.32âmgâmlâ1 RBB_D10 for 5âmin before flash-freezing. Crystals for RbtA alone (18.7âmgâmlâ1) were grown in 0.1âM BisâTris pH 6.5 and 20% (v/v) PEG 5,000 MME (SG1, Molecular Dimensions). All crystals were flash-cooled in liquid nitrogen before shipping to the synchrotron for data collection.
Diffraction data were collected at the NSLS2 beamline AMX/FMX (17-ID-1/17-ID-2). X-ray intensities and data reduction were evaluated and integrated by XDS43 and merged and scaled by Pointless and Aimless in the CCP4i2 program suite50. The X-ray crystal structure was determined by MR using the designed model for phasing by Phaser51. Next, the structure obtained from the MR was improved and refined by Phenix47. Model building was performed by Coot48 in between the refinement cycles. The final model was evaluated by MolProbity49. Data collection and refinement statistics are reported in Supplementary Table 5.
SPR
SPR experiments were performed using a Cytiva Biacore 8K in HBS-EP+ buffer from Cytiva. Measurements were obtained by immobilization of biotinylated target protein using the biotin capture kit from Cytiva. Binding screens were performed by single-cycle kinetics experiments using the standard protocol in the Biacore 8K control software at 30âµlâminâ1 with serial injections of 10ânM, 100ânM, 1âµM, 10âµM and 100âµM, an association time of 60âs and a dissociation time of 120âs. For MCL1 designs, a dissociation time of 150âs was used. To evaluate the affinity of successful designs, a nine-point single-cycle kinetics experiment was performed with an association time of 90âs and dissociation time of 300âs. The dilution series for MCB_D2 was twofold starting at 20âµM, that for MDB_D8 was fivefold starting at 50âµM, and those for GAB_D8, GAB_D23 and RBB_D10 were fivefold starting at 20âµM. Reported measurements were analyzed using Biacore Insight evaluation software; sensorgrams were double-referenced and fit with a 1:1 binding kinetics fit model.
AlphaScreen assay
We used the AlphaScreen assay as described by Leveille et al.52 to measure inhibition of the GABARAPâK1 interaction by the computationally designed macrocycles. K1 is a previously described GABARAP binder with a Kd of 10ânM (ref. 27). Biotin-labeled peptide K1 was used at a final concentration of 10ânM and incubated with 10 nM (final concentration) of 6xHisâGABARAP in a final reaction volume of 50âμl. Computationally designed inhibitor peptides were serially diluted with 1:3 dilutions using the highest final concentration of 50âμM and added to the reaction mixture. The buffer used was 25âmM HEPES pH 7.3, 150âmM NaCl, 0.01% Tween, 1âmgâmlâ1 BSA and 0.5% DMSO. The plate was covered in foil, centrifuged at 1,500ârpm for 2âmin and incubated for 150âmin at room temperature with shaking. Then, 20âμgâmlâ1 (final concentration) of the streptavidin donor beads and nickel chelate acceptor beads were added in the dark before incubating for another 45âmin. Data were collected on a Tecan plate reader using excitation at 680ânm and emission at 520â620ânm. Data were normalized to 0% (buffer only) and 100% (protein and tracer peptide, no inhibitor) controls. IC50 values were obtained from curve fits using GraphPad Prism 9 software, using the equation \(Y=\frac{100}{(1+{(\frac{X}{{{IC}}_{50}})}^{h})}\), where X is the concentration of inhibitor and h is the Hill coefficient. At least three independent replicates were used to calculate the average IC50 and the s.e.m.
Statistics and reproducibility
No statistical method was used to predetermine sample size. One trial from the AlphaScreen that was used to determine the IC50 of GAB_D8 was repeated and the repeated value is what was used. All data are included in the Source Data. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The design models and sequences are available in Supplementary Information. Crystal structures of MCB_D2 bound to MCL1, GAB_D8 bound to GABARAPL1, GAB_D23 bound to GABARAP, RBB_D10 bound to RbtA and apo RbtA were deposited to the PDB under accession codes 9CDT, 9HGC, 9HGD, 9CDU and 9CDV, respectively. Source data are provided with this paper.
Code availability
The code and scripts for running RFpeptides are available from Zenodo (https://doi.org/10.5281/zenodo.15264344)53. The code and scripts for RFpeptides are also available from RFdiffusion GitHub repository (https://github.com/RosettaCommons/RFdiffusion).
References
Vinogradov, A. A., Yin, Y. & Suga, H. Macrocyclic peptides as drug candidates: recent progress and remaining challenges. J. Am. Chem. Soc. 141, 4167â4181 (2019).
Muttenthaler, M., King, G. F., Adams, D. J. & Alewood, P. F. Trends in peptide drug discovery. Nat. Rev. Drug Discov. 20, 309â325 (2021).
Tsomaia, N. Peptide therapeutics: targeting the undruggable space. Eur. J. Med. Chem. 94, 459â470 (2015).
Atanasov, A. G., Zotchev, S. B. & Dirsch, V. M. International Natural Product Sciences Taskforce & Supuran, C. T. Natural products in drug discovery: advances and opportunities. Nat. Rev. Drug Discov. 20, 200â216 (2021).
Bhardwaj, G. et al. Accurate de novo design of membrane-traversing macrocycles. Cell 185, 3520â3532 (2022).
Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329â335 (2016).
Hosseinzadeh, P. et al. Comprehensive computational design of ordered peptide macrocycles. Science 358, 1461â1466 (2017).
Mulligan, V. K. et al. Computationally designed peptide macrocycle inhibitors of New Delhi metallo-β-lactamase 1. Proc. Natl Acad. Sci. USA 118, e2012800118 (2021).
Hosseinzadeh, P. et al. Anchor extension: a structure-guided approach to design cyclic peptides targeting enzyme active sites. Nat. Commun. 12, 3384 (2021).
MuratspahiÄ, E. et al. Design and structural validation of peptideâdrug conjugate ligands of the κ-opioid receptor. Nat. Commun. 14, 8064 (2023).
Rettie, S. A. et al. Cyclic peptide structure prediction and design using AlphaFold2. Nat. Commun. 16, 4730 (2025).
Grambow, C. A., Weir, H., Cunningham, C. N., Biancalani, T. & Chuang, K. V. CREMP: conformerârotamer ensembles of macrocyclic peptides for machine learning. Sci. Data 11, 859 (2024).
Zhang, C. et al. HighFold: accurately predicting structures of cyclic peptides and complexes with head-to-tail and disulfide bridge constraints. Brief. Bioinform 25, bbae215 (2024).
Brixi, G. et al. SaLT&PepPr is an interface-predicting language model for designing peptide-guided protein degraders. Commun. Biol. 6, 1081 (2023).
Xie, X., Valiente, P. A., Kim, J. & Kim, P. M. HelixDiff, a score-based diffusion model for generating all-atom α-helical structures. ACS Cent. Sci. 10, 1001â1011 (2024).
Li, Q., Vlachos, E. N. & Bryant, P. Design of linear and cyclic peptide binders of different lengths from protein sequence information. Preprint at bioRxiv https://doi.org/10.1101/2024.06.20.599739 (2024).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089â1100 (2023).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871â876 (2021).
Vázquez Torres, S. et al. De novo design of high-affinity binders of bioactive helical peptides. Nature 626, 435â442 (2024).
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49â56 (2022).
Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56â61 (2022).
Widden, H. & Placzek, W. J. The multiple mechanisms of MCL1 in the regulation of cell fate. Commun. Biol. 4, 1029 (2021).
Leman, J. K. et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 17, 665â680 (2020).
Kussie, P. H. et al. Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 274, 948â953 (1996).
Szalai, P. et al. Autophagic bulk sequestration of cytosolic cargo is independent of LC3, but requires GABARAPs. Exp. Cell Res. 333, 21â38 (2015).
Brown, H. et al. Structure-based design of stapled peptides that bind GABARAP and inhibit autophagy. J. Am. Chem. Soc. 144, 14687â14697 (2022).
Ji, C. H. et al. The AUTOTAC chemical biology platform for targeted protein degradation via the autophagyâlysosome system. Nat. Commun. 13, 904 (2022).
Popelka, H. & Klionsky, D. J. Analysis of the native conformation of the LIR/AIM motif in the Atg8/LC3/GABARAP-binding proteins. Autophagy 11, 2153â2159 (2015).
Weiergräber, O. H. et al. Ligand binding mode of GABAA receptor-associated protein. J. Mol. Biol. 381, 1320â1331 (2008).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579â2605 (2008).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825â2830 (2011).
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33, 2302â2309 (2005).
Dauparas, J. et al. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 22, 717â723 (2025).
Sternberg, A. MaxCluster: a tool for protein structure comparison and clustering. http://www.sbg.bio.ic.ac.uk/maxcluster/index.html (accessed 1 November 2024).
Siew, N., Elofsson, A., Rychlewski, L. & Fischer, D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16, 776â785 (2000).
Fire, E., Gullá, S. V., Grant, R. A. & Keating, A. E. Mclâ1âBim complexes accommodate surprising point mutations via minor structural changes. Protein Sci. 19, 507â519 (2010).
Anil, B., Riedinger, C., Endicott, J. A. & Noble, M. E. M. The structure of an MDM2âNutlin-3a complex solved by the use of a validated MDM2 surface-entropy reduction mutant. Acta Crystallogr. D 69, 1358â1366 (2013).
Sumida, K. H. et al. Improving protein expression stability and function with ProteinMPNN. J. Am. Chem. Soc. 146, 2054â2061 (2024).
Bhandari, B. K., Gardner, P. P. & Lim, C. S. Solubility-Weighted Index: fast and accurate prediction of protein solubility. Bioinformatics 36, 4691â4698 (2020).
Davis, G. D., Elisee, C., Newham, D. M. & Harrison, R. G. New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol. Bioeng 65, 382â388 (1999).
Dang, B. et al. SNAC-tag for sequence-specific chemical protein cleavage. Nat. Methods 16, 319â322 (2019).
Kabsch, W. XDS. Acta Crystallogr. D 66, 125â132 (2010).
Vagin, A. & Teplyakov, A. Molecular replacement with MOLREP. Acta Crystallogr. D 66, 22â25 (2010).
Vagin, A. & Lebedev, A. MoRDa, an automatic molecular replacement pipeline. Acta Crystallogr. A Found. Adv. 71, s19 (2015).
Terwilliger, T. C. et al. Iterative model building structure refinement and density modification with the PHENIX AutoBuild wizard. Acta Crystallogr. 64, 61â69 (2008).
Liebschner, D. et al. Macromolecular structure determination using X-rays neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 75, 861â877 (2019).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486â501 (2010).
Williams, C. J. et al. MolProbity: more and better reference data for improved allâatom structure validation. Protein Sci. 27, 293â315 (2018).
Winn, M. D. et al. Overview of the CCP 4 suite and current developments. Acta Crystallogr. D 67, 235â242 (2011).
McCoy, A. J. et al. Read Phaser crystallographic software. J. Appl. Crystallogr. 40, 658â674 (2007).
Leveille, A. N. et al. Exploring arylideneâindolinone ligands of autophagy proteins LC3B and GABARAP. ACS Med. Chem. Lett. 16, 271â277 (2025).
Rettie, S. et al. Accurate de novo design of high-affinity protein-binding macrocycles using deep learning. Zenodo https://doi.org/10.5281/zenodo.15264344 (2025).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng 9, 90â95 (2007).
Waskom, M. seaborn: statistical data visualization. J. Open Source Softw 6, 3021 (2021).
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci 30, 70â82 (2021).
Acknowledgements
We thank L. Stuart, L. Stewart, K. van Wormer, L. Goldschmidt, M. Kennedy, I. Haydon, J. Woodward, X. Li, Z. Taylor, H. Osterstock, G. Zhou, G. Gökçe, J. Palmer, K. Lindenauer, K. Campbell, M. Gloegl, R. Ragotte and M. Sadilek for their helpful feedback, guidance and support. We also thank the IPD Peptide, Crystallography and Biologics, Vaccines and Process Development core labs and the University of Washington Chemistry MS Facility for providing instrumentation support and expertise. This work was supported by funds from the DARPA Harnessing Enzymatic Activity for Lifesaving Remedies program HR001120S0052 contract HR0011-21-2-0012 (to G.B., D.B., J.D.M. and M.L.), the Defense Threat Reduction Agency HDTRA1-19-1-0003 (to D.B., G.B. and S.A.R.), the National Institutes of Health (NIH) 5R21AI178088-02 (to G.B. and S.A.R.), the Howard Hughes Medical Institute (HHMI) Emerging Pathogens Initiative (to J.D.M., G.B. and V.A.), startup funds from the University of Washingtonâs Department of Medicinal Chemistry and Institute for Protein Design (to G.B.), the Audacious Project (to G.B., A.K.B. and A.K.), the C19 HHMI Initiative grant (to A.K.B. and A.K.), NIH R35-GM148407 (to J.A.K. and A.N.L.), the European Unionâs Horizon Europe research and innovation program under the Marie SkÅodowska-Curie 101059124 (to Y.F.B.), NIH R01-R0AI160052 (to A.K.B. and A.K.), Deutsche Forschungsgemeinschaft project ID 267205415-SFB 1208 (to D.W.) and the Bill and Melinda Gates Foundation GR047983 (to D.J.). All plots in this paper were generated using matplotlib or seaborn54,55. Peptide structures were rendered using ChimeraX 1.9 (ref. 56) or PyMOL 2.5.4. Data were analyzed and plotted with Pandas versions 1.4.3 and plotted using matplotlib version 3.7.0 and seaborn version 0.12.2. All figures were created with BioRender.com. This research used resources (FMX/AMX) of the National Synchrotron Light Source II, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under contract No. DEâSC0012704. The Center for BioMolecular Structure is primarily supported by the NIH National Institute of General Medical Sciences through a Center Core P30 Grant (P30GM133893) and by the DOE Office of Biological and Environmental Research (KP1607011). This publication resulted from data collected using the beamtime obtained through Northeastern Collaborative Access Team BAG proposal no. 311950. Lastly, we would like to thank the staff of the ESRF and European Molecular Biology Laboratory for assistance and support in using beamline BM07 under proposal number MXâ2587.
Author information
Authors and Affiliations
Contributions
S.A.R., D.J., V.A., F.D. and G.B. conceptualized the study. D.J. and F.D. implemented the cyclic relative positional offsets into RF2 and RFdiffusion. S.A.R., D.J., V.A. and G.B. developed the protocol for generating and filtering designs. A.L. and J.D.M. identified RbtA as a surface-exposed target in A.âbaumannii. S.A.R., V.A., M.L., P.M.L., M.S. and V.V. synthesized the designs. Y.F.B., Q.Z., M.L., S.R.G., A.L., J.A.W., A.U. and A.M. expressed and purified the target proteins. S.A.R., V.A. and A.N.L. biophysically characterized the designed macrocyclic peptides. A.K.B., J.A.W., A.U., A.K., E.B. and O.H.W., determined the X-ray crystal structures of the designed macrocycle peptides bound to their targets. S.O., O.H.W., D.W., J.A.K., J.D.M., D.B., F.D. and G.B. offered supervision throughout the project. S.A.R., D.J., V.A., D.B. and G.B. wrote the paper. All authors read and contributed to the paper. S.A.R., D.J. and V.A. agree that the order of their respective names may be changed for personal pursuits to best suit their interests.
Corresponding authors
Ethics declarations
Competing interests
D.W. is a cofounder of Priavoid and Attyloid. D.B. and G.B. are cofounders, advisors and shareholders of Vilya. The other authors declare no competing interests.
Peer review
Peer review information
Nature Chemical Biology thanks Hyun Ho Park, Francesca Peccati and the other, anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Sections 1â5, Figs. 1â59 and Tables 1â10.
Source data
Source Data Fig. 1
Source data used in plots.
Source Data Fig. 2
Source data used in plots.
Source Data Fig. 3
Source data used in plots.
Source Data Fig. 4
Source data used in plots.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rettie, S.A., Juergens, D., Adebomi, V. et al. Accurate de novo design of high-affinity protein-binding macrocycles using deep learning. Nat Chem Biol (2025). https://doi.org/10.1038/s41589-025-01929-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41589-025-01929-w
This article is cited by
-
De novo design of macrocycles
Nature Reviews Drug Discovery (2025)






