Mapping transcription factor binding sites by learning UV damage fingerprints
- PMID: 41099699
- PMCID: PMC12526043
- DOI: 10.1093/nar/gkaf1014
Mapping transcription factor binding sites by learning UV damage fingerprints
Abstract
Deciphering transcriptional networks requires methods to accurately map binding sites of sequence-specific transcription factors (ssTFs) across the genome. Here, we show that ssTF binding induces distinct patterns of UV-induced cyclobutane pyrimidine dimers (CPDs), and that these CPD 'fingerprints' can be exploited by machine learning methods to identify ssTF binding sites (TFBS). As a proof of principle, we analyzed CPD-seq data from yeast cells using the Random Forest algorithm to identify 75 TFBS bound by the Hap2/Hap3/Hap5 ssTF complex, including ∼25 new sites missed by previous chromatin immunoprecipitation (ChIP)-based experiments. Parallel analysis of the Gcr1 ssTF using a neural network trained on CPD-seq data including only 6 known binding sites identified 63 Gcr1 TFBS across the genome. Our analysis indicates that the newly identified TFBS are associated with many genes that function in expected categories (e.g. mitochondrial respiration or glycolysis), and whose mRNA levels are down-regulated in ssTF mutants. Similar analysis of CPD-capture-sequencing data from human cells identified new sites bound by the homologous Nuclear Factor-Y complex. These findings indicate that distinct cellular patterns of UV damage occurring at different classes of TFBS can be recognized by machine learning methods to map these regulatory elements with improved accuracy and single-nucleotide resolution.
© The Author(s) 2025. Published by Oxford University Press.
Conflict of interest statement
The authors declare they have no competing interests.
Figures
 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
- Full Text Sources
- Research Materials
 
        