Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 3;11(40):eadx0796.
doi: 10.1126/sciadv.adx0796. Epub 2025 Oct 1.

AI-directed gene fusing prolongs the evolutionary half-life of synthetic gene circuits

Affiliations

AI-directed gene fusing prolongs the evolutionary half-life of synthetic gene circuits

Itamar Menuhin-Gruman et al. Sci Adv. .

Abstract

Evolutionary instability is a persistent challenge in synthetic biology, often leading to the loss of heterologous gene expression over time. Here, we present STABLES, a gene fusion strategy that links a gene of interest (GOI) to an essential endogenous gene (EG), with a "leaky" stop codon in between. This ensures both selective pressure against deleterious mutations and the high expression of the GOI. By leveraging a machine learning framework, we predict optimal GOI-EG pairs on the basis of bioinformatic and biophysical features, identify linkers likely to minimize protein misfolding, and optimize DNA sequences for stability and expression. Experimental validation in Saccharomyces cerevisiae demonstrated substantial improvements in stability and productivity for fluorescent proteins and human proinsulin. The results highlight a scalable, adaptable, and organism-agnostic method to enhance the evolutionary stability of engineered strains, with broad implications for industrial biotechnology and synthetic biology.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Components of the STABLES solution.
(A) Standard production of a heterological gene. The heterologous GOI is inserted into the genome. Any mutations that reduce expression or induce misfolding would prove advantageous because of the lower metabolic burden. These mutations proliferate, and a batch must be replaced. (B) Replacement of an EG with a fusion gene. An EG is removed and replaced by a fusion gene composed of the GOI and EG. Mutations leading to the loss of expression or misfolding would be deleterious to the EG, resulting in host death. This limits the spread of many mutations. The EG is selected by an ML model trained on experimental data. (C) Selection of linker. Different linkers may lead to interaction between the fused proteins and misfolding. Using biophysical models and a database of fusion linkers, a linker is selected to minimize structural changes between the fused and unfused states. (D) Sequence optimization. By optimizing the sequence of the GOI and linker, hypermutable sites are avoided, codon usage bias is maximized, and weak mRNA folding is enforced at the start of gene. This further improves stability and expression. (E) A leaky stop codon enables the translation of both the GOI and fusion gene. A leaky stop codon is placed between the GOI and the linker. Because of partial read-through, both the protein of interest and fusion protein are generated. By informed selection of stop codon, large quantities of the protein of interest and just viable quantities of the fusion protein are produced. The GOI’s mutational stability is further enhanced, as more mutations would prove deleterious. (F) Production of the heterological gene, aided by STABLES. The adapted process has much higher mutational stability, reducing the need to replace batches. The optimized cells exhibit much higher expression.
Fig. 2.
Fig. 2.. Computational model for EG selection.
(A) Motivation for EG selection model development. Final-to-initial fluorescence ratios for 10 EGs fused to GFP or unfused GFP (second from the left) after 15 days of evolution. Values represent the means from three replicates per condition. Fused constructs retained higher rates of fluorescence than unfused GFP (Student’s t test, P0.048 ), indicating greater stability. The observed variability among constructs (Kruskal H, P0.035 ), emphasizes the importance of rational EG selection. SEC2 showed significantly improved stability over unfused GFP (Student’s t test, P0.047 ). We successfully predicted the top performing gene with our model. (B) Gene selection model pipeline. Experimental data (65, 66) covering 6685 EGs and 2 GOIs were used to extract biologically meaningful features, encompassing properties of the EG, GOI, and their interactions. Training and validation splits were used for feature engineering and model tuning. An ensemble of KNN and XGB models was selected on the basis of consistent performance across splits. (C) Performance among top 3 recommendations. In each bootstrap resample, model predictions were converted into quantiles. Model architectures were compared using the highest-performing EG among their top three predictions, reflecting expected performance when testing three fusion constructs. For the selected architecture, the x axis displays the top score among recommendations. The y axis indicates the fraction of bootstrap samples within each performance bin. Predictions were consistently near-optimal (median quantile: 0.995), with a low likelihood (P ≈ 0.048) of scoring below 0.98. (D) Performance for top recommendation. Similar analysis for the single top recommendation yielded a median quantile of 0.939, with rare scores below 0.92 (P0.007) . (E) Distribution of top Shapley values. Averaged across 20 XGB models, tAI emerged as the most predictive feature, followed by GC content, codon usage bias, alternative ORF lengths, mRNA folding energy, and amino acid composition similarity between the EG and the GOI.
Fig. 3.
Fig. 3.. Demonstration of STABLES: Improving proinsulin expression in S. cerevisiae.
(A) Design of the evolution experiment. The schematics of the variants in the first proinsulin evolution experiment are displayed for clarity. (B) Gene fusion affects expression at time zero. ELISA measurements of proinsulin expression in six configurations—unfused or fused to CAF20 or ARC15 with or without a leader sequence—at day 0. As previously shown, the leader sequence is essential for expressing unfused proinsulin. For fusion constructs, it substantially improves expression, potentially due to increased mRNA stability or improved protein localization in the cell. (C) Proinsulin production over time. ELISA measurements were taken every 5 days (n = 3 per time point) for four variants: (i) baseline—Novo Nordisk’s patented sequence, integrated at the CAN1 locus; (ii) optimized—same structure with proinsulin and linker optimized by ESO (21); (iii and iv) fusions—proinsulin fused to CAF20 or ARC15. Fusion genes were synthesized using the full STABLES pipeline (EG selection, linker choice, leader inclusion, and codon optimization). Expression decay over time fits an exponential model, supported by high R2 in the log-linear space ( P<106 ). The variants exhibit significantly different decay rates [analysis of variance (ANOVA) F-test, P<1019 ], with the STABLES-designed constructs showing substantially improved stability. (D) Normalized cumulative proinsulin production. Integrating the fitted expression curves yields the cumulative expression per variant. All values were normalized by the 10-day cumulative expression of the baseline variant. STABLES-derived variants exhibited greatly increased cumulative yields across time points, illustrating improved production through rational EG fusion design.
Fig. 4.
Fig. 4.. Experimental study of STABLES results and components.
(A) Mutation accumulated in the in-lab evolution experiment. Nanopore sequencing revealed widespread deletions following in-lab evolution: (a) Unfused proinsulin lost most of its promoter and coding sequence; (b) ARC15 fusion showed large promoter deletions and smaller mutations in both the promoter and proinsulin; (c) CAF20 fusion retained most of its structure, with only minor mutations across the construct. (B) Western blot validation. Western blotting confirms that CAF20 and ARC15 fusions yield prolonged proinsulin expression compared to unfused constructs. Actin measurements were taken for control. (C) Leaky stop codon construct design. Schematic of the construct used to assess translation read-through rates of leaky stop codons using BFP and mCherry fusion. (D) Termination efficiency of leaky stop codons. Various constructs were tested, each linking BFP and mCherry via different leaky stop codons. The fluorescence intensity of each fluorophore (normalized to standalone expression) served as a proxy for read-through efficiency. Three top-performing designs [L1 to L3; see (E)] and three rejected variants (with too high or low read-through) are shown. The BFP signal represents GOI expression, and mCherry represents the C-terminal EG. (E) Proinsulin expression for different leaky stop codons. ARC15 fusions were built with three different leaky stop codons and without them as the control. ELISA measurements over 50 days demonstrated that all leaky stop codons improved protein stability (Student’s t test at t = 40, P<105 ). Distinct expression profiles among codons (Kruskal H test, P<0.003 ) emphasize the need for informed selection. All selected codons preserve the last amino acid of proinsulin, use the design principles outlined in (166), and displayed a read-through rate of 0.1 to 0.25 in the previous experiment. Sequences used (stop codon + 3 downstream nt): L1 – TAGGCG; L2 – TGAGCG; L3 – TGACAA.

References

    1. Zhu L., Zhu Y., Zhang Y., Li Y., Engineering the robustness of industrial microbes through synthetic biology. Trends Microbiol. 20, 94–101 (2012). - PubMed
    1. Parker M. T., Kunjapur A. M., Deployment of engineered microbes: Contributions to the bioeconomy and considerations for biosecurity. Health Secur. 18, 278–296 (2020). - PubMed
    1. Venturelli O. S., Egbert R. G., Arkin A. P., Towards engineering biological systems in a broader context. J. Mol. Biol. 428, 928–944 (2016). - PubMed
    1. Clarke L., Kitney R., Developing synthetic biology for industrial biotechnology applications. Biochem. Soc. Trans. 48, 113–122 (2020). - PMC - PubMed
    1. Zhang W., Nielsen D. R., Synthetic biology applications in industrial microbiology. Front. Microbiol. 5, fmicb.2014.00451 (2014). - PMC - PubMed

LinkOut - more resources