Detection of cryptic mutation profiles in SARS-CoV-2 metagenomes through variant co-occurrence
- Install Nextflow (if not yet in working environment) via
conda create -n nextflow -c bioconda nextflow==25.10.2
conda activate nextflow- Download pipeline
git clone https://github.com/Krannich479/crymco.git
cd crymco- Fill I/O section (files) in nextflow.config
- Run pipeline via
nextflow run main.nf -profile local,mamba- Alignment + Variant Calling + Phasing + Haplotagging
- Reformatting haplotype table into Nextstrain variant nomenclature
-
- bin/devider2hdm.py
- Normalize mutation frequency table + join with haplotype table + compute distance matrix + compute single-linkage hierarchical clustering
-
- bin/haploDistance.py
- The current PNG of the last pipeline step is hardly readable as static image. Needs adjustment; likely in python script
- Numeric output would be usefull I guess. Printing/using the distance matrix is likely a good starting point
- There might be a more sophisticated clustering method for this task than currently implemented (e.g. skbio has a module for UPGMA)
- Generating the mutation frequency table needs documentation for third-party users