Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSG: The coordinate interval for display is of different length than for the reference allele #1775

Open
suzyhh opened this issue Oct 22, 2024 · 5 comments
Assignees

Comments

@suzyhh
Copy link

suzyhh commented Oct 22, 2024

Describe the issue

Hello, I'm testing v113 using the v113 docker obtained from docker hub and the v113 cache downloaded using the following command:
docker run --network host -t -i -v /mnt/data1/software/vep/:/opt/vep/.vep ensemblorg/ensembl-vep:release_113.0 INSTALL.pl -a c -s homo_sapiens_merged -y GRCh38

I have successfully tested v113 with a VCF containing structural variants, but I encounter an error when running with a VCF containing SNVs/indels. After the error the process just hangs doing nothing until I kill it. The output files contains the VCF header but no variants.

System

  • VEP version: v113.0 (Docker)
  • VEP Cache version: v113.0

Full VEP command line

vep --cache --offline \
    -i ~{vcf} \
    --dir_cache /opt/vep/.vep \
    --dir_plugins /opt/vep/.vep/Plugins/ \
    --vcf --compress_output bgzip \
    --merged \
    --fasta $refFasta \
    --assembly GRCh38 \
    --no_stats \
    --fork ~{fork} \
    --buffer_size ~{buffer} \
    --no_escape \
    --check_existing \
    --hgvs --hgvsg \
    --af --af_gnomadg \
    --protein --uniprot \
    --symbol \
    --numbers \
    --allele_number \
    --sift b --polyphen b \
    --pubmed \
    --show_ref_allele \
    --variant_class \
    --mane_select \
    --transcript_version \
    --flag_pick_allele_gene \
    --plugin MaxEntScan,/opt/vep/.vep/fordownload,NCSS,SWA \
    --plugin SpliceAI,snv=~{spliceaiSnv},indel=~{spliceaiIndel} \
    --plugin SpliceRegion \
    --plugin NearestExonJB,max_range=100 \
    --plugin REVEL,/opt/vep/.vep/new_tabbed_revel_grch38.tsv.gz \
    --plugin SpliceDistance \
    --plugin UTRAnnotator,file=/opt/vep/.vep/fordownload/uORF_5UTR_GRCh38_PUBLIC.txt \
    --plugin GnomadPli,file=~{gnomadv4Pli} \
    --plugin CADD,snv=~{caddSnv},indels=~{caddIndel} \
    --plugin AlphaMissense,file=~{alphaMissense} \
    --custom file=${hgmdZip},short_name=HGMD,format=vcf,type=exact,coords=0,fields=CLASS%PHEN \
    --custom file=~{clinVarVcf},short_name=ClinVar,format=vcf,type=exact,coords=0,fields=CLNSIG%CLNREVSTAT%CLNDN \
    --custom file=~{gnomadv4Vcf},short_name=gnomad4,format=vcf,type=exact,coords=0,fields=homCount%hetCount%hemiCount%gnomadFilter \
    --force_overwrite \
    -o ~{outName}_snvs.vep.vcf.gz

Full error message

STDERR 
STDERR -------------------- EXCEPTION --------------------
STDERR MSG: The coordinate interval for display is of different length than for the reference allele
STDERR STACK Bio::EnsEMBL::Variation::Utils::Sequence::hgvs_variant_notation /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/Sequence.pm:511
STDERR STACK Bio::EnsEMBL::Variation::VariationFeature::hgvs_genomic /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/VariationFeature.pm:2013
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::VariationFeatureOverlapAllele_to_output_hash /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:1314
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::TranscriptVariationAllele_to_output_hash /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:1606
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::get_all_VariationFeatureOverlapAllele_output_hashes /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:393
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::get_all_output_hashes_by_VariationFeature /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory.pm:347
STDERR STACK Bio::EnsEMBL::VEP::OutputFactory::VCF::get_all_lines_by_InputBuffer /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/OutputFactory/VCF.pm:316
STDERR STACK Bio::EnsEMBL::VEP::Runner::_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:422
STDERR STACK (eval) /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:671
STDERR STACK Bio::EnsEMBL::VEP::Runner::_forked_process /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:665
STDERR STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:496
STDERR STACK Bio::EnsEMBL::VEP::Runner::next_output_line /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:367
STDERR STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:208
STDERR STACK toplevel /opt/vep/src/ensembl-vep/vep:46
STDERR Date (localtime)    = Mon Oct 21 13:54:46 2024
STDERR Ensembl API version = 113
STDERR ---------------------------------------------------
STDERR Died in forked process 53

I have tested my command using v112 and it completes successfully with no errors/warnings so it does seem to be a v113 problem. Running locally I have found that the problem occurs when using the --hgvsg flag. If I run using only --hgvs I do not get the above error, but I do get several warnings, here is a small snippet:

WARNING: 12 : WARNING: Transcript-assembly mismatch in rs6650119
WARNING: Transcript-assembly mismatch in rs6650119
WARNING: 13 : WARNING: Transcript-assembly mismatch in rs2275166
WARNING: Transcript-assembly mismatch in rs2275166
WARNING: 14 : WARNING: Transcript-assembly mismatch in rs309472
WARNING: Transcript-assembly mismatch in rs309472
WARNING: Transcript-assembly mismatch in rs586178
WARNING: Transcript-assembly mismatch in rs586178
WARNING: 39 : WARNING: Transcript-assembly mismatch in rs193922695
WARNING: Transcript-assembly mismatch in rs193922695
WARNING: 52 : WARNING: Transcript-assembly mismatch in rs7601549
WARNING: Transcript-assembly mismatch in rs7601549

I do not receive any errors or warning when I run v113 without both --hgvs and --hgvsg. This unfortunately renders v113 unusable for us as we rely on the hgvs annotations.

Many thanks!

@likhitha-surapaneni likhitha-surapaneni self-assigned this Oct 22, 2024
@likhitha-surapaneni
Copy link
Contributor

Hi @suzyhh , thank you for reporting the issue. Can you kindly provide us the input file/test input used?

@suzyhh
Copy link
Author

suzyhh commented Oct 22, 2024

@likhitha-surapaneni

I have attached the input file (bgzipped for compatibility with github) that generates the error - ta!
TwEx_ProbandF-TwEx_MotherF-TwEx_FatherF_gnomad_filtered_mt.vcf.gz

@likhitha-surapaneni
Copy link
Contributor

likhitha-surapaneni commented Oct 22, 2024

Hi @suzyhh , thank you for providing us the file. We know why --hgvsg might be erroring out and are trying to push in a post-release fix for it (related to Ensembl/ensembl-variation#1124)

Regarding the warnings associated with --hgvs, we are trying to investigate this and will update you

@likhitha-surapaneni
Copy link
Contributor

Hi @suzyhh , the fixes for the error are now available on the latest docker images (latest or release_113.1).

The warning "Transcript-assembly mismatch in rs586178" means there is a mismatch between refseq and ensembl in this position. By default, VEP uses the ref alleles from refseq. For this example variant rs586178, the HGVSc is NM_001330430.4:c.48C>C, this is not a valid variant (basically edits the ref G to a C ). After editing the allele the variant becomes C>C. The consequence is synonymous which is also not correct.
If you use --use_given_ref , the output would be NM_001330430.4:c.48G>C. Hope this is useful. We will try to make the warning and docs clearer for this case.

Thank you and let us know if this solves the issue.

@suzyhh
Copy link
Author

suzyhh commented Oct 24, 2024

@likhitha-surapaneni Thanks so much for this fix, I can now run the new version of 113 :)

I have to admit I don't 100% understand the --use_given_ref option, but I have tested in v113 and it doesn't seem to have the behaviour you describe. I've tested using rs778986.

When I run without --use_given_ref I see the behaviour you described, with the variant being incorrectly described as synonymous with the nomenclature G>G:
image

However when I run with --use_given_ref, the HGVSc is no longer G>G, but I think is also incorrect (A>G and it should be T>C?), and the consequence and codons is also incorrect:
image

I've tested with v112 using --use_given_ref and this gives me what I expect is the correct behaviour:
image
The cDNA change in the HGVSc is also T>C in v112 which I think is correct

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants