CG-Based Stratification of 8-mers Highlights Functional Roles and Phylogenetic Divergence Markers
- PMID: 41096744
- PMCID: PMC12525110
- DOI: 10.3390/ijms26199477
CG-Based Stratification of 8-mers Highlights Functional Roles and Phylogenetic Divergence Markers
Abstract
K-mer analysis is a powerful tool for understanding genome structure and evolution. A "k-mer" refers to a short DNA sequence made up of k nucleotides (where k is a specific integer), while an "m-mer" is a similar concept but with a shorter sequence length. The functional mechanisms of CG-containing k-mers, as well as their potential role in evolutionary processes, remain unclear. To explore this issue, we analyzed 8-mers in several species with varying genomic complexities and evolutionary divergences: Homo sapiens, Saccharomyces cerevisiae, Bombyx mori, Ciona intestinalis, Danio rerio, and Caenorhabditis elegans, which were grouped by CG dinucleotide content (0CG, 1CG, and 2CG). We examined the relative frequencies of shorter m-mers (with m = 3 and 4) within each CG-defined group, using information-theoretic, distance-based, and angular metrics. Our results show that 0CG motifs follow random patterns, while 1CG and 2CG motifs display significant deviations, likely due to functional constraints such as nucleosome-binding and CpG island association. The observed unimodal distribution of 8-mers arises from the convergence of the three CG-defined groups. Among them, the 2CG group shows the highest divergence in m-mer composition, followed by 1CG, reflecting varying degrees of selective pressure. Furthermore, species-specific differences in CG-classified 8-mer patterns could provide valuable insights into phylogenetic relationships. Through extensive comparison, we explore how CG content and sequence composition influence genomic organization and contribute to evolutionary divergence across different taxa. These findings deepen our understanding of short motif functions, genome organization, and sequence evolution.
Keywords: CG dinucleotide; information-theoretic analysis; k-mer distribution; sequence evolution.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.
Figures
 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                References
MeSH terms
Grants and funding
LinkOut - more resources
- Full Text Sources
- Miscellaneous
 
        