You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
particularly SelectVariants, but really anything that writes out a vcf.
Affected version(s)
Latest public release version [version?]
Latest master branch as of [2/6/2020]
Description
There are minimal/inconsistent checks that variants added to vcfWriters are correctly ordered (an issue with htsjdk I think). This leads to an insidious bug, where a sorted vcf can be fed through SelectVariants, and depending on the flavor of output vcf, either crash, or succeed but output an incorrectly sorted vcf. The issue is that in some circumstances SelectVariants will trim alleles to their minimal representation, which can change the location of a variant record, and thus reorder them. However, SelectVariants does nothing to account for the potential order change. Since vcfWriter implementations in htsjdk seem to do minimal/inconsistent checks on the order of variants being added to them, this may write out an incorrectly sorted vcf, or throw an exception, depending on the flavor of vcfWriter.
Steps to reproduce
With attached (zipped because github) vcf, run gatk SelectVariants -V test.input.vcf -sn SAMPLE_01 -O test.output.vcf
Tool will succeed, but output vcf will be incorrectly sorted. Somehow, this incorrectly sorted vcf will also be accompanied by an index! Though if you try to run IndexFeatureFile on the output vcf separately, it will fail.
run gatk SelectVariants -V test.input.vcf -sn SAMPLE_01 -O test.output.vcf.gz
tool will throw exception w/ stack trace:
java.lang.IllegalArgumentException: Features added out of order: previous (TabixFeature{referenceIndex=0, start=17148456, end=17148456, featureStartFilePosition=2460, featureEndFilePosition=-1}) > next (TabixFeature{referenceIndex=0, start=17148447, end=17148457, featureStartFilePosition=2509, featureEndFilePosition=-1})
at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:89)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:242)
at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.apply(SelectVariants.java:620)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Also, there don't seem to be any engine checks that the input vcf used to drive a variant walker is correctly sorted. That is, SelectVariants (for example) can be run on an input vcf which is not correctly sorted, without raising any sort of complaint. I suppose this is slightly less troubling than the output ordering issue, as it falls under "garbage in, garbage out".
The text was updated successfully, but these errors were encountered:
@kachulis can you give an example where trimming changes the position? This seems to violate some of the assumptions we've been making for the BQ joint calling work.
@ldgauthier this happens when the alleles have shared bases on the left. For example chr1 17148447 . GAAACAAAACA GAAACAAAATA
trims to chr1 17148456 . C T
I think this most commonly arises when splitting alleles at multiallelic sites. For example the site above came from splitting the multiallelic site chr1 17148447 . GAAACAAAACA GAAACAAAACAAAACAAAACA,GAAACAAAATA,G
Bug Report
Affected tool(s) or class(es)
particularly SelectVariants, but really anything that writes out a vcf.
Affected version(s)
Description
There are minimal/inconsistent checks that variants added to vcfWriters are correctly ordered (an issue with htsjdk I think). This leads to an insidious bug, where a sorted vcf can be fed through SelectVariants, and depending on the flavor of output vcf, either crash, or succeed but output an incorrectly sorted vcf. The issue is that in some circumstances SelectVariants will trim alleles to their minimal representation, which can change the location of a variant record, and thus reorder them. However, SelectVariants does nothing to account for the potential order change. Since vcfWriter implementations in htsjdk seem to do minimal/inconsistent checks on the order of variants being added to them, this may write out an incorrectly sorted vcf, or throw an exception, depending on the flavor of vcfWriter.
Steps to reproduce
With attached (zipped because github) vcf, run
gatk SelectVariants -V test.input.vcf -sn SAMPLE_01 -O test.output.vcf
Tool will succeed, but output vcf will be incorrectly sorted. Somehow, this incorrectly sorted vcf will also be accompanied by an index! Though if you try to run
IndexFeatureFile
on the output vcf separately, it will fail.run
gatk SelectVariants -V test.input.vcf -sn SAMPLE_01 -O test.output.vcf.gz
tool will throw exception w/ stack trace:
test.input.vcf.zip
Also, there don't seem to be any engine checks that the input vcf used to drive a variant walker is correctly sorted. That is, SelectVariants (for example) can be run on an input vcf which is not correctly sorted, without raising any sort of complaint. I suppose this is slightly less troubling than the output ordering issue, as it falls under "garbage in, garbage out".
The text was updated successfully, but these errors were encountered: