-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Goal
I have a large dataset of variants in HGVSc format (GENCODE v28 / Ensembl v92) and my objective is to transform them into HGVSg format.
Example: ENST00000524377.5:c.1026G>T --> chr1:g.156873808G>T
Steps Taken
I am attempting to achieve this using local reference files, but the next steps in the workflow are not obvious to me. Here is what I have done so far:
Downloaded Ensembl Release 92 references:
FASTA: ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.{}.fa.gz
GFF3: ftp.ensembl.org/pub/release-92/gff3/homo_sapiens/Homo_sapiens.GRCh38.92.chromosome.{}.gff3.gz
Retrieved the model:
Bash
mutalyzer_retriever --parse --split --output cache --id chr1 --mrna_id --model_type all -s ensembl --type fasta from_file --paths ../data/Annotation/Homo_sapiens.GRCh38.92.chromosome.1.gff3 ../data/REFERENCE/Homo_sapiens.GRCh38.dna.chromosome.1.fa
Result: This successfully returned chr1.annotations and chr1.sequence files in the cache directory.
Attempted mapping:
Bash
MUTALYZER_SETTINGS="$(pwd)/config.txt" mutalyzer_mapper "ENST00000239461:c.417+1750G>T" --reference-id chr1
Current Result / Error
The mutalyzer_mapper command failed
Questions
- What are the correct next steps after generating the cache files with mutalyzer_retriever?
- What is the required structure of the reference files for mutalyzer_mapper?
- Is it possible to achieve my goal using these chromosome-scale GFF3 and FASTA files, or do I need to recreate GFF3 and FASTA files for each transcript of interest?
Thank you in advance for your guidance!