Switch from pyvcf to cyvcf2 for VCF parsing #146

mbhall88 · 2022-02-08T07:22:00Z

Closes #142

I also found (and fixed) a bug with the ONT preset. Basically, it was being set after the genotyper and coverage parsers were constructed. As a result, the genotyper was being set with the default error rate and ploidy, which were Illumina.

I noticed this after testing this PR with a fastq from my TB data and noticed a heap of HET calls. After the fix, the results of v0.10.0 and this PR are identical for predict.

martinghunt

Thanks for this!

Can see you've stuck to the original logic with how to use or ignore VCF records, which totally makes sense. I'm thinking this is a good time to change some of that logic - see comments.

src/mykrobe/_vcf/models.py

martinghunt · 2022-02-08T12:07:47Z

src/mykrobe/_vcf/models.py

                    valid = False
            try:
-                if sample["GT_CONF"] <= 1:
+                gt_conf = record.format("GT_CONF")[i][0]
+                if gt_conf <= 1:  # todo: magic number.


I'm not even sure we should be doing this at all. I'm wondering if historically there was a reason for it because some tool was being used to make VCFs where a conf of 1 was significant? I'd vote to not even look for GT_CONF.

There's some value in ignoring GT_CONF==0 positions, as in those cases we have no idea what the genotype is i guess?

ok, see your point. The counter proposal is: document that all you need is GT field, and all variants with non-ref GT get used. Leave it up to the user to make a filtered VCF as input to mykrobe? Plus, is it better to accidentally include an extra background snp as opposed to silently exclude snps because the user didn't put all the fields mykrobe's looking for in their VCF?
I don't mind though, path of least resistance is leave it how it is now.

I'm inclined to agree with Martin on this (remove any gt conf checks). This VCF should realistically be pre-curated. If we are going to do any kind of curating, we should be opening up all of these options to users. And that is somewhere I don't think we want/need to go.

mbhall88 · 2022-02-09T00:08:52Z

Ok, so the logic simplifies down to if the record isn't homozygous alt, then it's invalid.

I also realised that numpy wasn't listed in the install_requires. Biopython requires it so we were getting it via that, but better to be explicit.

As for the failing appveyor: this is a windows/python3.9 issue I think. We pin numpy v1.15.0 in appveyor

mykrobe/.appveyor.yml

Line 45 in 14621ad

- '%BASH% -lc "pip3 install tox requests numpy==1.15"'

Not sure if this is necesary or if we can relax it? Basically numpy don't have python 3.9 wheels for "older" versions of numpy (yet), which means pip tries to compile it from source, which then requires some BLAS library which doesn't come with windows by default.
Two options would be to relax the version, or use on of the prebuilt wheels mentioned in the linked stackoverflow answer.

martinghunt · 2022-02-09T09:31:14Z

Relax, or not even specify, the version? Expect it will be ok. numpy is only being used for poisson and binomial distributions, and its log10 function.

mbhall88 · 2022-03-29T01:22:33Z

Ok, the appveyor file has been reset. @martinghunt you can squash (will remove all those rubbish appveyor commits) this PR when you're ready.

mbhall88 added 8 commits February 7, 2022 10:45

switch pyvcf to cyvcf2 and create metadata interface

5f52581

replace pyvcf metadata interface

2c8106b

replace info, format and filter interfaces

3f36fc3

update is_record_valid to new interface

0a24542

change interface for get genotype likelihoods function

e34559f

change vcf interface for create variants and calls

9bac6de

ensure ONT set before genotyper constructed

0f720ac

Update CHANGELOG.md

406d1c0

mbhall88 requested a review from martinghunt February 8, 2022 07:22

martinghunt reviewed Feb 8, 2022

View reviewed changes

mbhall88 added 2 commits February 9, 2022 10:04

simplify record validation

55575b5

add missing numpy dependency

2737265

mbhall88 added 16 commits February 9, 2022 20:57

remove numpy version spec in appveyor

0c72b64

add python to test path

316773d

tell tox which python

be369a9

Update .appveyor.yml

aef474e

ensure same python3 is used everywhere

e21d4d3

run tox through python

a6f46f8

verbose tox

ffab7b3

try explicitly install cyvcf2

4d7c01e

add openssl

91cc0af

try msys from htslib appveyor

a4c4615

use sh instead of bash

676a43e

Update .appveyor.yml

dfb5f75

Update .appveyor.yml

0d69a68

Update .appveyor.yml

7152152

Update .appveyor.yml

d8b0a0d

Update .appveyor.yml

c0e00df

mbhall88 added 17 commits February 11, 2022 15:05

Update .appveyor.yml

a2f2b96

try cyvcf2 from source

6f79a61

Update .appveyor.yml

4d945e3

Update .appveyor.yml

5bb4524

Update .appveyor.yml

9091487

Update .appveyor.yml

468842d

Update .appveyor.yml

be4c95c

Update .appveyor.yml

97bbaf2

Update .appveyor.yml

70e498d

Update .appveyor.yml

cc94476

Update .appveyor.yml

efddda8

Update .appveyor.yml

8577d21

Update .appveyor.yml

36298d2

Update .appveyor.yml

7b4bcb0

update appveyor

06cea22

Update .appveyor.yml

facb01a

reset appveyor to master

0d7cd9d

martinghunt merged commit afa500e into Mykrobe-tools:master Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from pyvcf to cyvcf2 for VCF parsing #146

Switch from pyvcf to cyvcf2 for VCF parsing #146

mbhall88 commented Feb 8, 2022 •

edited

Loading

martinghunt left a comment

martinghunt Feb 8, 2022

iqbal-lab Feb 8, 2022

martinghunt Feb 8, 2022

mbhall88 Feb 8, 2022

iqbal-lab Feb 8, 2022

mbhall88 commented Feb 9, 2022 •

edited

Loading

martinghunt commented Feb 9, 2022

mbhall88 commented Mar 29, 2022

Switch from pyvcf to cyvcf2 for VCF parsing #146

Switch from pyvcf to cyvcf2 for VCF parsing #146

Conversation

mbhall88 commented Feb 8, 2022 • edited Loading

martinghunt left a comment

Choose a reason for hiding this comment

martinghunt Feb 8, 2022

Choose a reason for hiding this comment

iqbal-lab Feb 8, 2022

Choose a reason for hiding this comment

martinghunt Feb 8, 2022

Choose a reason for hiding this comment

mbhall88 Feb 8, 2022

Choose a reason for hiding this comment

iqbal-lab Feb 8, 2022

Choose a reason for hiding this comment

mbhall88 commented Feb 9, 2022 • edited Loading

martinghunt commented Feb 9, 2022

mbhall88 commented Mar 29, 2022

mbhall88 commented Feb 8, 2022 •

edited

Loading

mbhall88 commented Feb 9, 2022 •

edited

Loading