Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strelka VCF missing GT field causes error during "call" command #943

Open
brandon-hastings opened this issue Jan 29, 2025 · 0 comments
Open

Comments

@brandon-hastings
Copy link

I am using the call command to attempt to generate BAF and allele-specific copy numbers and was running into the issue of negative BAF values described in #601. Following the guidance there, I used the call command and specified the tumor and normal samples from a strelka VCF and got the following error:

Selected test sample TUMOR and control sample NORMAL
Skipping NC_072790.1:221367 G @ TUMOR; 'invalid FORMAT: GT'
Traceback (most recent call last):
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/bin/cnvkit.py", line 10, in <module>
    sys.exit(main())
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
    args.func(args)
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/commands.py", line 1178, in _cmd_call
    varr = load_het_snps(
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/cmdutil.py", line 30, in load_het_snps
    varr = tabio.read(
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/__init__.py", line 75, in read
    dframe = reader(infile, **kwargs)
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 62, in read_vcf
    table = pd.DataFrame.from_records(rows, columns=columns)
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/pandas/core/frame.py", line 2450, in from_records
    first_row = next(data)
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 233, in _parse_records
    depth, zygosity, alt_count = _extract_genotype(sample, record)
  File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 303, in _extract_genotype
    gts = set(sample["GT"])
  File "pysam/libcbcf.pyx", line 3541, in pysam.libcbcf.VariantRecordSample.__getitem__
  File "pysam/libcbcf.pyx", line 813, in pysam.libcbcf.bcf_format_get_value
KeyError: 'invalid FORMAT: GT'

After examining the strelka VCF file, it appears that the GT field is not present (which appears to be deliberate by strelka Illumina/strelka#16). I have pasted the header of my VCF here with the available fields along with the first line. Could support for strelka be added?

##FILTER=<ID=LowDepth,Description="Tumor or normal sample read depth at this locus is below 2">
##FILTER=<ID=LowEVS,Description="Somatic Empirical Variant Score (SomaticEVS) is below threshold">
##FORMAT=<ID=AU,Number=2,Type=Integer,Description="Number of 'A' alleles used in tiers 1,2">
##FORMAT=<ID=CU,Number=2,Type=Integer,Description="Number of 'C' alleles used in tiers 1,2">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1 (used+filtered)">
##FORMAT=<ID=FDP,Number=1,Type=Integer,Description="Number of basecalls filtered from original read depth for tier1">
##FORMAT=<ID=GU,Number=2,Type=Integer,Description="Number of 'G' alleles used in tiers 1,2">
##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Number of reads with deletions spanning this site at tier1">
##FORMAT=<ID=SUBDP,Number=1,Type=Integer,Description="Number of reads below tier1 mapping quality threshold aligned across this site">
##FORMAT=<ID=TU,Number=2,Type=Integer,Description="Number of 'T' alleles used in tiers 1,2">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Combined depth across samples">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=NT,Number=1,Type=String,Description="Genotype of the normal in all data tiers, as used to classify somatic variants. One of {ref,het,hom,conflict}.">
##INFO=<ID=PNOISE,Number=1,Type=Float,Description="Fraction of panel containing non-reference noise at this site">
##INFO=<ID=PNOISE2,Number=1,Type=Float,Description="Fraction of panel containing more than one non-reference noise obs at this site">
##INFO=<ID=QSS,Number=1,Type=Integer,Description="Quality score for any somatic snv, ie. for the ALT allele to be present at a significantly different frequency in the tumor and normal">
##INFO=<ID=QSS_NT,Number=1,Type=Integer,Description="Quality score reflecting the joint probability of a somatic variant and NT">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref read-position in the tumor">
##INFO=<ID=SGT,Number=1,Type=String,Description="Most likely somatic genotype excluding normal noise states">
##INFO=<ID=SNVSB,Number=1,Type=Float,Description="Somatic SNV site strand bias">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
##INFO=<ID=SomaticEVS,Number=1,Type=Float,Description="Somatic Empirical Variant Score (EVS) expressing the phred-scaled probability of the call being a false positive observation.">
##INFO=<ID=TQSS,Number=1,Type=Integer,Description="Data tier used to compute QSS">
##INFO=<ID=TQSS_NT,Number=1,Type=Integer,Description="Data tier used to compute QSS_NT">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
NC_072790.1     221367  .       G       C       .       LowEVS  DP=49;MQ=30.81;MQ0=15;NT=ref;QSS=1;QSS_NT=1;ReadPosRankSum=-0.16;SGT=CG->CG;SNVSB=0.00;SOMATIC;SomaticEVS=0.11;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    5:0:0:0:0,0:1,2:4,16:0,0              17:1:0:0:0,0:2,2:14,29:0,0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant