-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 pass variant walker #4744
2 pass variant walker #4744
Changes from 6 commits
a7fada1
1910651
e002c1b
a673774
02253f3
067f6fe
147f65e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
package org.broadinstitute.hellbender.engine; | ||
|
||
import htsjdk.variant.variantcontext.VariantContext; | ||
import org.broadinstitute.hellbender.engine.filters.CountingReadFilter; | ||
import org.broadinstitute.hellbender.engine.filters.VariantFilter; | ||
import org.broadinstitute.hellbender.utils.SimpleInterval; | ||
|
||
import java.util.stream.StreamSupport; | ||
|
||
/** | ||
* A VariantWalker that makes two passes through the variants. | ||
* This allows the user to store internal states during the first pass, which the user can then | ||
* process and access during the second pass | ||
**/ | ||
public abstract class TwoPassVariantWalker extends VariantWalker { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For tests, I think we need 3 things here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The unit test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's good to have both the unit test and the integration test. The unit test uses a toy implementation to count calls to first/second pass apply, which is a good correctness test for the traversal. However, we also try to include a runnable example implementation that actually produces meaningful output for every walker type as a template for tool developers, and that's where |
||
/** | ||
* Overrides the default, single-pass traversal framework of {@link VariantWalkerBase} | ||
*/ | ||
@Override | ||
public void traverse(){ | ||
final VariantFilter variantContextFilter = makeVariantFilter(); | ||
final CountingReadFilter readFilter = makeReadFilter(); | ||
|
||
// First pass through the variants | ||
logger.info("Starting first pass through the variants"); | ||
traverseVariants(variantContextFilter, readFilter, this::firstPassApply); | ||
logger.info("Finished first pass through the variants"); | ||
|
||
// Process the data accumulated during the first pass | ||
afterFirstPass(); | ||
|
||
// Second pass | ||
logger.info("Starting second pass through the variants"); | ||
traverseVariants(variantContextFilter, readFilter, this::secondPassApply); | ||
|
||
logger.info(readFilter.getSummaryLine()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This accumulates the filter count across both passes and displays the aggregate total. The TwoPassReadWalker does the same thing, though I wonder if thats a feature, given that the second pass is usually an implementation detail. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I talked to Louis and James a bit about this. I don't think we need two different |
||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
/** | ||
* | ||
* First pass through the variants. The user may store data in instance variables of the walker | ||
* and process them in {@link #afterFirstPass} before making a second pass | ||
* | ||
* @param variant A variant record in a vcf | ||
* @param readsContext Reads overlapping the current variant. Will be empty if a read source (e.g. bam) isn't provided | ||
* @param referenceContext Reference bases spanning the current variant | ||
* @param featureContext A record overlapping the current variant from an auxiliary data source (e.g. gnomad) | ||
*/ | ||
protected abstract void firstPassApply(final VariantContext variant, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add full javadoc for |
||
final ReadsContext readsContext, | ||
final ReferenceContext referenceContext, | ||
final FeatureContext featureContext ); | ||
/** | ||
* Process the data collected during the first pass. This method is called between the two traversals | ||
*/ | ||
protected abstract void afterFirstPass(); | ||
|
||
/** | ||
* | ||
* Having seen all of the variants in a vcf, make a second pass through the variants | ||
* | ||
* See argument docs of {@link #firstPassApply} and {@link VariantWalkerBase#apply} | ||
**/ | ||
protected abstract void secondPassApply(final VariantContext variant, | ||
final ReadsContext readsContext, | ||
final ReferenceContext referenceContext, | ||
final FeatureContext featureContext); | ||
|
||
private void traverseVariants(final VariantFilter variantFilter, final CountingReadFilter readFilter, final VariantConsumer variantConsumer){ | ||
StreamSupport.stream(getSpliteratorForDrivingVariants(), false) | ||
.filter(variantFilter) | ||
.forEach(variant -> { | ||
final SimpleInterval variantInterval = new SimpleInterval(variant); | ||
variantConsumer.consume(variant, | ||
new ReadsContext(reads, variantInterval, readFilter), | ||
new ReferenceContext(reference, variantInterval), | ||
new FeatureContext(features, variantInterval)); | ||
progressMeter.update(variantInterval); | ||
}); | ||
} | ||
|
||
@FunctionalInterface | ||
private interface VariantConsumer { | ||
void consume(final VariantContext variant, final ReadsContext readsContext, final ReferenceContext reference, final FeatureContext features); | ||
} | ||
|
||
/** | ||
* Make final to hide it from subclasses | ||
*/ | ||
@Override | ||
public final void apply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
package org.broadinstitute.hellbender.tools.examples; | ||
|
||
import htsjdk.variant.variantcontext.VariantContext; | ||
import htsjdk.variant.variantcontext.VariantContextBuilder; | ||
import htsjdk.variant.variantcontext.writer.VariantContextWriter; | ||
import htsjdk.variant.vcf.*; | ||
import org.broadinstitute.barclay.argparser.Argument; | ||
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties; | ||
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions; | ||
import org.broadinstitute.hellbender.cmdline.programgroups.ExampleProgramGroup; | ||
import org.broadinstitute.hellbender.engine.FeatureContext; | ||
import org.broadinstitute.hellbender.engine.ReadsContext; | ||
import org.broadinstitute.hellbender.engine.ReferenceContext; | ||
import org.broadinstitute.hellbender.engine.TwoPassVariantWalker; | ||
import org.broadinstitute.hellbender.utils.variant.GATKVCFConstants; | ||
|
||
import java.io.File; | ||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Document in a 1-2 sentence comment what the example tool does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
/** | ||
* This walker makes two traversals through variants in a vcf. During the first pass, it collects the qual-by-depth (QD) annotation | ||
* of each variant and stores it in a list. After the first traversal, it computes the vcf-wide average and variance of QD's. | ||
* The walker will then traverse the variants the second time and annotate each variant with the distance of its QD from the mean | ||
* in units of standard deviations. | ||
* | ||
*/ | ||
@CommandLineProgramProperties( | ||
summary = "Example variant walker that makes two passes through a vcf", | ||
oneLineSummary = "Example two-pass variant walker", | ||
programGroup = ExampleProgramGroup.class, | ||
omitFromCommandLine = true | ||
) | ||
public class ExampleTwoPassVariantWalker extends TwoPassVariantWalker { | ||
@Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME, | ||
doc = "Output vcf", optional = true) | ||
private File outputVcf = null; | ||
|
||
private final List<Double> qualByDepths = new ArrayList<>(); | ||
|
||
private VariantContextWriter vcfWriter; | ||
|
||
static String QD_DISTANCE_FROM_MEAN = "QD_DIST"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a constant this should be |
||
|
||
static String COPY_OF_QD_KEY_NAME = "QD_COPY"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should also be |
||
|
||
private double averageQualByDepth; | ||
|
||
private double sampleVarianceOfQDs; | ||
|
||
private int counter = 0; | ||
|
||
@Override | ||
public void onTraversalStart() { | ||
final VCFHeader inputHeader = getHeaderForVariants(); | ||
inputHeader.addMetaDataLine(new VCFInfoHeaderLine(QD_DISTANCE_FROM_MEAN, 1, VCFHeaderLineType.Float, "distance from the average QD value in the units of standard deviations")); | ||
inputHeader.addMetaDataLine(new VCFInfoHeaderLine(COPY_OF_QD_KEY_NAME, 1, VCFHeaderLineType.Float, "copy of the QD INFO field")); | ||
vcfWriter = createVCFWriter(outputVcf); | ||
vcfWriter.writeHeader(inputHeader); | ||
} | ||
|
||
@Override | ||
protected void firstPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) { | ||
qualByDepths.add(variant.getAttributeAsDouble(GATKVCFConstants.QUAL_BY_DEPTH_KEY, 0.0)); | ||
} | ||
|
||
@Override | ||
protected void afterFirstPass() { | ||
final int n = qualByDepths.size(); | ||
if (n == 1){ | ||
return; | ||
} | ||
averageQualByDepth = qualByDepths.stream().mapToDouble(x -> x).average().orElse(0.0); | ||
sampleVarianceOfQDs = (1.0/(n - 1.0))* qualByDepths.stream().mapToDouble(x -> Math.pow((x - averageQualByDepth), 2.0)).sum(); | ||
} | ||
|
||
@Override | ||
protected void secondPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) { | ||
VariantContextBuilder vcb = new VariantContextBuilder(variant); | ||
final double qualByDepth = qualByDepths.get(counter++); | ||
final double distanceFromMean = Math.abs((qualByDepth - averageQualByDepth)/ Math.sqrt(sampleVarianceOfQDs)); | ||
vcb.attribute(QD_DISTANCE_FROM_MEAN, distanceFromMean); | ||
vcb.attribute(COPY_OF_QD_KEY_NAME, qualByDepth); | ||
|
||
vcfWriter.add(vcb.make()); | ||
} | ||
|
||
@Override | ||
public void closeTool() { | ||
if ( vcfWriter != null ) { | ||
vcfWriter.close(); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
package org.broadinstitute.hellbender.engine; | ||
|
||
import htsjdk.variant.variantcontext.VariantContext; | ||
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties; | ||
import org.broadinstitute.hellbender.GATKBaseTest; | ||
import org.broadinstitute.hellbender.cmdline.TestProgramGroup; | ||
import org.testng.Assert; | ||
import org.testng.annotations.Test; | ||
|
||
public class TwoPassVariantWalkerUnitTest extends GATKBaseTest { | ||
@CommandLineProgramProperties( | ||
summary = "An example subclass of TwoPassVariantWalker", | ||
oneLineSummary = "An example subclass of TwoPassVariantWalker", | ||
programGroup = TestProgramGroup.class, | ||
omitFromCommandLine = true | ||
) | ||
private static class DummyTwoPassVariantWalker extends TwoPassVariantWalker { | ||
public int firstPass = 0; | ||
public int secondPass = 0; | ||
public boolean visitedAfterFirstPass = false; | ||
|
||
@Override | ||
protected void firstPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) { | ||
firstPass++; | ||
} | ||
|
||
@Override | ||
protected void afterFirstPass(){ | ||
visitedAfterFirstPass = true; | ||
} | ||
|
||
@Override | ||
protected void secondPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) { | ||
secondPass++; | ||
} | ||
} | ||
|
||
@Test | ||
public void testTwoPassTraversal() { | ||
final DummyTwoPassVariantWalker walker = new DummyTwoPassVariantWalker(); | ||
final String testVcf = "src/test/resources/org/broadinstitute/hellbender/tools/walkers/variantutils/VariantsToTable/multiallelic.vcf"; | ||
|
||
final String[] args = { "-V", testVcf }; | ||
|
||
walker.instanceMain(args); | ||
|
||
final int expectedNumberOfVariantContexts = 52; | ||
Assert.assertEquals(walker.firstPass, expectedNumberOfVariantContexts); | ||
Assert.assertEquals(walker.secondPass, expectedNumberOfVariantContexts); | ||
Assert.assertTrue(walker.visitedAfterFirstPass); | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
package org.broadinstitute.hellbender.tools.examples; | ||
|
||
import htsjdk.variant.variantcontext.VariantContext; | ||
import org.broadinstitute.hellbender.CommandLineProgramTest; | ||
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions; | ||
import org.broadinstitute.hellbender.engine.FeatureDataSource; | ||
import org.broadinstitute.hellbender.utils.test.BaseTest; | ||
import org.broadinstitute.hellbender.utils.variant.GATKVCFConstants; | ||
import org.testng.Assert; | ||
import org.testng.annotations.Test; | ||
|
||
import java.io.File; | ||
import java.util.Iterator; | ||
|
||
public class ExampleTwoPassVariantWalkerIntegrationTest extends CommandLineProgramTest { | ||
@Test | ||
public void test() { | ||
final File outputVcf = BaseTest.createTempFile("output", "vcf"); | ||
final String inputVcf = "src/test/resources/org/broadinstitute/hellbender/tools/walkers/variantutils/SelectVariants/haploid-multisample.vcf"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really a good VCF to use for this test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was the first vcf that had QD - most vcfs in the test suite don't |
||
|
||
final String[] args = { | ||
"-" + StandardArgumentDefinitions.VARIANT_SHORT_NAME, inputVcf, | ||
"-" + StandardArgumentDefinitions.OUTPUT_SHORT_NAME, outputVcf.getAbsolutePath() | ||
}; | ||
|
||
runCommandLine(args); | ||
|
||
try (final FeatureDataSource<VariantContext> variantsBefore = new FeatureDataSource<>(inputVcf); | ||
final FeatureDataSource<VariantContext> variantsAfter = new FeatureDataSource<>(outputVcf)){ | ||
final Iterator<VariantContext> beforeIterator = variantsBefore.iterator(); | ||
final Iterator<VariantContext> afterIterator = variantsAfter.iterator(); | ||
|
||
while (beforeIterator.hasNext() && afterIterator.hasNext()){ | ||
final VariantContext before = beforeIterator.next(); | ||
final VariantContext after = afterIterator.next(); | ||
|
||
Assert.assertEquals(before.getAttributeAsDouble(GATKVCFConstants.QUAL_BY_DEPTH_KEY, -1.0), | ||
after.getAttributeAsDouble(ExampleTwoPassVariantWalker.COPY_OF_QD_KEY_NAME, -2.0)); | ||
Assert.assertTrue(after.getAttributeAsDouble(ExampleTwoPassVariantWalker.QD_DISTANCE_FROM_MEAN, -1.0) > 0); | ||
} | ||
|
||
Assert.assertTrue(! afterIterator.hasNext() && ! beforeIterator.hasNext()); | ||
} | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - we can use this in the CNNScoreVariants tool (#4316). It will need class and method javadoc though, and unit tests (see the ones for TwoPassReadWalker).