Skip to content

Workflow for mapping HGVSc to HGVSg using local chromosome-scale GFF3/FASTA #116

@KostyaTyurin

Description

@KostyaTyurin

Goal
I have a large dataset of variants in HGVSc format (GENCODE v28 / Ensembl v92) and my objective is to transform them into HGVSg format.
Example: ENST00000524377.5:c.1026G>T --> chr1:g.156873808G>T

Steps Taken
I am attempting to achieve this using local reference files, but the next steps in the workflow are not obvious to me. Here is what I have done so far:

Downloaded Ensembl Release 92 references:

FASTA: ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.{}.fa.gz

GFF3: ftp.ensembl.org/pub/release-92/gff3/homo_sapiens/Homo_sapiens.GRCh38.92.chromosome.{}.gff3.gz

Retrieved the model:

Bash
mutalyzer_retriever --parse --split --output cache --id chr1 --mrna_id --model_type all -s ensembl --type fasta from_file --paths ../data/Annotation/Homo_sapiens.GRCh38.92.chromosome.1.gff3 ../data/REFERENCE/Homo_sapiens.GRCh38.dna.chromosome.1.fa
Result: This successfully returned chr1.annotations and chr1.sequence files in the cache directory.

Attempted mapping:

Bash
MUTALYZER_SETTINGS="$(pwd)/config.txt" mutalyzer_mapper "ENST00000239461:c.417+1750G>T" --reference-id chr1
Current Result / Error
The mutalyzer_mapper command failed

Questions

  1. What are the correct next steps after generating the cache files with mutalyzer_retriever?
  2. What is the required structure of the reference files for mutalyzer_mapper?
  3. Is it possible to achieve my goal using these chromosome-scale GFF3 and FASTA files, or do I need to recreate GFF3 and FASTA files for each transcript of interest?

Thank you in advance for your guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions