Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skips all het SNV sites because deepvariant uses Number=A for the VAF format field. #19

Open
JakeHagen opened this issue Apr 18, 2024 · 3 comments

Comments

@JakeHagen
Copy link

Hello and thank you for releasing this tool.

The code below (which I am assuming is there to filter multi-allelic sites) causes issues when using Deepvariant's VCFs. Deepvariant uses Number=A for the format VAF field which causes pysam to always parse it as a tuple, even if it is not a multi-allelic site.

if type(snv[self.af_tag]) == tuple:
    continue

I was able to get spectre to run as expected by manually changing the VCF header to Number=1 for VAF

I wonder if you would be open to figuring out another way to filter multi-allelic sites. Maybe something like

if len(snv_record.alts) > 1:
    continue

Thanks
Jake

@lfpaulin
Copy link
Collaborator

Hello Jake,
thanks for pointing this out. Is it possible that you could share use one or two examples of SNP entries with that format
so that we could catch those cases too. We only tested with bi-allelic sites. It is also worth noticing that multi allelic sites may
not work during LoH calling, or at least the current version will bot be able to work.
Best
Luis

@JakeHagen
Copy link
Author

Hi Luis, thank you for your response

Attached are two test vcfs. They have only one identical variant. The only difference is the FORMAT VCF header line ie:
##FORMAT=<ID=VAF,Number=A,Type=Float,Description="Variant allele fractions."> vs ##FORMAT=<ID=VAF,Number=1,Type=Float,Description="Variant allele fractions.">

Please note that the only variant in the vcfs is a bi-allelic variant, the variants I was feeding into spectre were already filtered to bi-allelic sites.

Also below is the error I get when using the non-modified vcf. Using the modified version allows spectre to run to completion.

Matplotlib created a temporary cache directory at /tmp/matplotlib-7lrdbiqb because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
spectre::2024-04-19 14:44:00,654::INFO::spectre.spectre>  Spectre input: CNVCaller --coverage ../mosdepth/test.regions.bed.gz --sample-id test -r ./GRCh38.fasta --output-dir ./test --blacklist ./grch38_blacklist_spectre.bed --metadata ./metadata.mdr --snfj ../snf2json/test.snfj --snv dv16.vcf.gz --only-chr chr1
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre>  Spectre version: 0.2.0
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre>  Starting spectre
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre>  Evaluate if a new .mdr file needs to be created
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre>  Using existing metadata file ./metadata.mdr
spectre::2024-04-19 14:44:00,656::INFO::spectre.spectreCNV>  Spectre calculating for: ../mosdepth/test.regions.bed.gz
spectre::2024-04-19 14:44:00,656::INFO::spectre.spectreCNV>  Analysing CN neutral state from SNV data
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis>  Starting SNV Copy number state compute ...
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis>  Het sites count per chromosome
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis>  Use chromsomes based on proportion of het sites
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis>  n het sites used for normalization = 0
Traceback (most recent call last):
  File "/usr/local/bin/spectre", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/spectre/spectre.py", line 637, in main
    spectre_run.spectre_exe()
  File "/usr/local/lib/python3.12/site-packages/spectre/spectre.py", line 356, in spectre_exe
    spectre_main.cnv_call()
  File "/usr/local/lib/python3.12/site-packages/spectre/spectreCNV.py", line 71, in cnv_call
    self.snv_analysis.snv_copy_number_state()
  File "/usr/local/lib/python3.12/site-packages/spectre/analysis/snv_analysis.py", line 231, in snv_copy_number_state
    self.snv_multimodal_detect()
  File "/usr/local/lib/python3.12/site-packages/spectre/analysis/snv_analysis.py", line 104, in snv_multimodal_detect
    self.het_clean() if np.max(self.perfect_het_depth) > 10*np.nanmean(self.perfect_het_depth) else None
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 2810, in max
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation maximum which has no identity

Thanks
Jake

dv16_mod.vcf.gz
dv16.vcf.gz

@lfpaulin
Copy link
Collaborator

Hello Jake, thanks for the detailed information, I will check it out and reach back asap.

Luis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants