You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the call command to attempt to generate BAF and allele-specific copy numbers and was running into the issue of negative BAF values described in #601. Following the guidance there, I used the call command and specified the tumor and normal samples from a strelka VCF and got the following error:
Selected test sample TUMOR and control sample NORMAL
Skipping NC_072790.1:221367 G @ TUMOR; 'invalid FORMAT: GT'
Traceback (most recent call last):
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/bin/cnvkit.py", line 10, in <module>
sys.exit(main())
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
args.func(args)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/commands.py", line 1178, in _cmd_call
varr = load_het_snps(
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/cmdutil.py", line 30, in load_het_snps
varr = tabio.read(
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/__init__.py", line 75, in read
dframe = reader(infile, **kwargs)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 62, in read_vcf
table = pd.DataFrame.from_records(rows, columns=columns)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/pandas/core/frame.py", line 2450, in from_records
first_row = next(data)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 233, in _parse_records
depth, zygosity, alt_count = _extract_genotype(sample, record)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 303, in _extract_genotype
gts = set(sample["GT"])
File "pysam/libcbcf.pyx", line 3541, in pysam.libcbcf.VariantRecordSample.__getitem__
File "pysam/libcbcf.pyx", line 813, in pysam.libcbcf.bcf_format_get_value
KeyError: 'invalid FORMAT: GT'
After examining the strelka VCF file, it appears that the GT field is not present (which appears to be deliberate by strelka Illumina/strelka#16). I have pasted the header of my VCF here with the available fields along with the first line. Could support for strelka be added?
##FILTER=<ID=LowDepth,Description="Tumor or normal sample read depth at this locus is below 2">
##FILTER=<ID=LowEVS,Description="Somatic Empirical Variant Score (SomaticEVS) is below threshold">
##FORMAT=<ID=AU,Number=2,Type=Integer,Description="Number of 'A' alleles used in tiers 1,2">
##FORMAT=<ID=CU,Number=2,Type=Integer,Description="Number of 'C' alleles used in tiers 1,2">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1 (used+filtered)">
##FORMAT=<ID=FDP,Number=1,Type=Integer,Description="Number of basecalls filtered from original read depth for tier1">
##FORMAT=<ID=GU,Number=2,Type=Integer,Description="Number of 'G' alleles used in tiers 1,2">
##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Number of reads with deletions spanning this site at tier1">
##FORMAT=<ID=SUBDP,Number=1,Type=Integer,Description="Number of reads below tier1 mapping quality threshold aligned across this site">
##FORMAT=<ID=TU,Number=2,Type=Integer,Description="Number of 'T' alleles used in tiers 1,2">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Combined depth across samples">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=NT,Number=1,Type=String,Description="Genotype of the normal in all data tiers, as used to classify somatic variants. One of {ref,het,hom,conflict}.">
##INFO=<ID=PNOISE,Number=1,Type=Float,Description="Fraction of panel containing non-reference noise at this site">
##INFO=<ID=PNOISE2,Number=1,Type=Float,Description="Fraction of panel containing more than one non-reference noise obs at this site">
##INFO=<ID=QSS,Number=1,Type=Integer,Description="Quality score for any somatic snv, ie. for the ALT allele to be present at a significantly different frequency in the tumor and normal">
##INFO=<ID=QSS_NT,Number=1,Type=Integer,Description="Quality score reflecting the joint probability of a somatic variant and NT">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref read-position in the tumor">
##INFO=<ID=SGT,Number=1,Type=String,Description="Most likely somatic genotype excluding normal noise states">
##INFO=<ID=SNVSB,Number=1,Type=Float,Description="Somatic SNV site strand bias">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
##INFO=<ID=SomaticEVS,Number=1,Type=Float,Description="Somatic Empirical Variant Score (EVS) expressing the phred-scaled probability of the call being a false positive observation.">
##INFO=<ID=TQSS,Number=1,Type=Integer,Description="Data tier used to compute QSS">
##INFO=<ID=TQSS_NT,Number=1,Type=Integer,Description="Data tier used to compute QSS_NT">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
NC_072790.1 221367 . G C . LowEVS DP=49;MQ=30.81;MQ0=15;NT=ref;QSS=1;QSS_NT=1;ReadPosRankSum=-0.16;SGT=CG->CG;SNVSB=0.00;SOMATIC;SomaticEVS=0.11;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 5:0:0:0:0,0:1,2:4,16:0,0 17:1:0:0:0,0:2,2:14,29:0,0
The text was updated successfully, but these errors were encountered:
I am using the call command to attempt to generate BAF and allele-specific copy numbers and was running into the issue of negative BAF values described in #601. Following the guidance there, I used the call command and specified the tumor and normal samples from a strelka VCF and got the following error:
After examining the strelka VCF file, it appears that the GT field is not present (which appears to be deliberate by strelka Illumina/strelka#16). I have pasted the header of my VCF here with the available fields along with the first line. Could support for strelka be added?
The text was updated successfully, but these errors were encountered: