Optimization of regulatory DNA with active learning
- PMID: 41142759
- PMCID: PMC12553016
- DOI: 10.1016/j.csbj.2025.09.033
Optimization of regulatory DNA with active learning
Abstract
Many biotechnology applications rely on microbial strains engineered to express heterologous proteins at maximal yield. A common strategy for improving protein output is to design expression systems with optimized regulatory DNA elements. Recent advances in high-throughput experimentation have enabled the use of machine learning predictors in tandem with sequence optimizers to find regulatory sequences with improved phenotypes. Yet the narrow coverage of training data, limited model generalization, and non-convexity of genotype-phenotype landscapes can limit the use of traditional sequence optimization algorithms. Here, we explore the use of active learning as a strategy to improve expression levels through iterative rounds of measurements, model training, and sequence sampling-and-selection. We explore convergence and performance of the active learning loop using synthetic data and an experimentally characterized genotype-phenotype landscape of yeast promoter sequences. Our results show that active learning can outperform one-shot optimization approaches in complex landscapes with a high degree of epistasis. We demonstrate the ability of active learning to effectively optimize sequences using datasets from different experimental conditions, with potential for leveraging data across laboratories, strains or growth conditions. Our findings highlight active learning as an effective framework for DNA sequence design, offering a powerful strategy for phenotype optimization in biotechnology.
© 2025 The Authors.
Conflict of interest statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures
References
-
- Clomburg J.M., Crumbley A.M., Gonzalez R. Industrial biomanufacturing: the future of chemical production. Science. 2017;355(6320) - PubMed
-
- Cazier A.P., Blazeck J. Advances in promoter engineering: novel applications and predefined transcriptional control. Biotechnol J. 2021;16(10) - PubMed
-
- Greenbury S.F., Louis A.A., Ahnert S.E. The structure of genotype-phenotype maps makes fitness landscapes navigable. Nat Ecol Evol. 2022 Nov;6(11):1742–1752. - PubMed
LinkOut - more resources
Full Text Sources