Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 pass variant walker #4744

Merged
merged 7 commits into from
May 17, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
package org.broadinstitute.hellbender.engine;

import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.hellbender.engine.filters.CountingReadFilter;
import org.broadinstitute.hellbender.engine.filters.VariantFilter;
import org.broadinstitute.hellbender.utils.SimpleInterval;

import java.util.stream.StreamSupport;

/**
* A VariantWalker that makes two passes through the variants.
* This allows the user to store internal states during the first pass, which the user can then
* process and access during the second pass
**/
public abstract class TwoPassVariantWalker extends VariantWalker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - we can use this in the CNNScoreVariants tool (#4316). It will need class and method javadoc though, and unit tests (see the ones for TwoPassReadWalker).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, I think we need 3 things here:

  • A TwoPassVariantWalkerUnitTest test class, modeled after TwoPassReadWalkerUnitTest
  • An ExampleTwoPassVariantWalker tool in tools.examples, modeled after the existing ExampleVariantWalker tool
  • A simple ExampleTwoPassVariantWalkerIntegrationTest, modeled after ExampleVariantWalkerIntegrationTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test for TwoPassVariantWalker writes a private example walker class and runs it, which seems to be the same as the integration test. I think doing either is sufficient - what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have both the unit test and the integration test. The unit test uses a toy implementation to count calls to first/second pass apply, which is a good correctness test for the traversal. However, we also try to include a runnable example implementation that actually produces meaningful output for every walker type as a template for tool developers, and that's where ExampleTwoPassVariantWalker comes in.

/**
* Overrides the default, single-pass traversal framework of {@link VariantWalkerBase}
*/
@Override
public void traverse(){
final VariantFilter variantContextFilter = makeVariantFilter();
final CountingReadFilter readFilter = makeReadFilter();

// First pass through the variants
logger.info("Starting first pass through the variants");
traverseVariants(variantContextFilter, readFilter, this::firstPassApply);
logger.info("Finished first pass through the variants");

// Process the data accumulated during the first pass
afterFirstPass();

// Second pass
logger.info("Starting second pass through the variants");
traverseVariants(variantContextFilter, readFilter, this::secondPassApply);

logger.info(readFilter.getSummaryLine());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accumulates the filter count across both passes and displays the aggregate total. The TwoPassReadWalker does the same thing, though I wonder if thats a feature, given that the second pass is usually an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to Louis and James a bit about this. I don't think we need two different VariantFilters, so I'll just call makeVariantFilter() twice, in the hope that calling it the second time resets the counter

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add logger.info(readFilter.getSummaryLine()) after the second traversal pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/**
*
* First pass through the variants. The user may store data in instance variables of the walker
* and process them in {@link #afterFirstPass} before making a second pass
*
* @param variant A variant record in a vcf
* @param readsContext Reads overlapping the current variant. Will be empty if a read source (e.g. bam) isn't provided
* @param referenceContext Reference bases spanning the current variant
* @param featureContext A record overlapping the current variant from an auxiliary data source (e.g. gnomad)
*/
protected abstract void firstPassApply(final VariantContext variant,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add full javadoc for firstPassApply(), secondPassApply(), and afterFirstPass(), modeled after the javadoc for the analogous methods in TwoPassReadWalker

final ReadsContext readsContext,
final ReferenceContext referenceContext,
final FeatureContext featureContext );
/**
* Process the data collected during the first pass. This method is called between the two traversals
*/
protected abstract void afterFirstPass();

/**
*
* Having seen all of the variants in a vcf, make a second pass through the variants
*
* See argument docs of {@link #firstPassApply} and {@link VariantWalkerBase#apply}
**/
protected abstract void secondPassApply(final VariantContext variant,
final ReadsContext readsContext,
final ReferenceContext referenceContext,
final FeatureContext featureContext);

private void traverseVariants(final VariantFilter variantFilter, final CountingReadFilter readFilter, final VariantConsumer variantConsumer){
StreamSupport.stream(getSpliteratorForDrivingVariants(), false)
.filter(variantFilter)
.forEach(variant -> {
final SimpleInterval variantInterval = new SimpleInterval(variant);
variantConsumer.consume(variant,
new ReadsContext(reads, variantInterval, readFilter),
new ReferenceContext(reference, variantInterval),
new FeatureContext(features, variantInterval));
progressMeter.update(variantInterval);
});
}

@FunctionalInterface
private interface VariantConsumer {
void consume(final VariantContext variant, final ReadsContext readsContext, final ReferenceContext reference, final FeatureContext features);
}

/**
* Make final to hide it from subclasses
*/
@Override
public final void apply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
package org.broadinstitute.hellbender.tools.examples;

import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.variantcontext.VariantContextBuilder;
import htsjdk.variant.variantcontext.writer.VariantContextWriter;
import htsjdk.variant.vcf.*;
import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties;
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
import org.broadinstitute.hellbender.cmdline.programgroups.ExampleProgramGroup;
import org.broadinstitute.hellbender.engine.FeatureContext;
import org.broadinstitute.hellbender.engine.ReadsContext;
import org.broadinstitute.hellbender.engine.ReferenceContext;
import org.broadinstitute.hellbender.engine.TwoPassVariantWalker;
import org.broadinstitute.hellbender.utils.variant.GATKVCFConstants;

import java.io.File;
import java.util.ArrayList;
import java.util.List;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document in a 1-2 sentence comment what the example tool does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* This walker makes two traversals through variants in a vcf. During the first pass, it collects the qual-by-depth (QD) annotation
* of each variant and stores it in a list. After the first traversal, it computes the vcf-wide average and variance of QD's.
* The walker will then traverse the variants the second time and annotate each variant with the distance of its QD from the mean
* in units of standard deviations.
*
*/
@CommandLineProgramProperties(
summary = "Example variant walker that makes two passes through a vcf",
oneLineSummary = "Example two-pass variant walker",
programGroup = ExampleProgramGroup.class,
omitFromCommandLine = true
)
public class ExampleTwoPassVariantWalker extends TwoPassVariantWalker {
@Argument(fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME,
doc = "Output vcf", optional = true)
private File outputVcf = null;

private final List<Double> qualByDepths = new ArrayList<>();

private VariantContextWriter vcfWriter;

static String QD_DISTANCE_FROM_MEAN = "QD_DIST";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a constant this should be static final, not just static


static String COPY_OF_QD_KEY_NAME = "QD_COPY";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be static final


private double averageQualByDepth;

private double sampleVarianceOfQDs;

private int counter = 0;

@Override
public void onTraversalStart() {
final VCFHeader inputHeader = getHeaderForVariants();
inputHeader.addMetaDataLine(new VCFInfoHeaderLine(QD_DISTANCE_FROM_MEAN, 1, VCFHeaderLineType.Float, "distance from the average QD value in the units of standard deviations"));
inputHeader.addMetaDataLine(new VCFInfoHeaderLine(COPY_OF_QD_KEY_NAME, 1, VCFHeaderLineType.Float, "copy of the QD INFO field"));
vcfWriter = createVCFWriter(outputVcf);
vcfWriter.writeHeader(inputHeader);
}

@Override
protected void firstPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {
qualByDepths.add(variant.getAttributeAsDouble(GATKVCFConstants.QUAL_BY_DEPTH_KEY, 0.0));
}

@Override
protected void afterFirstPass() {
final int n = qualByDepths.size();
if (n == 1){
return;
}
averageQualByDepth = qualByDepths.stream().mapToDouble(x -> x).average().orElse(0.0);
sampleVarianceOfQDs = (1.0/(n - 1.0))* qualByDepths.stream().mapToDouble(x -> Math.pow((x - averageQualByDepth), 2.0)).sum();
}

@Override
protected void secondPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {
VariantContextBuilder vcb = new VariantContextBuilder(variant);
final double qualByDepth = qualByDepths.get(counter++);
final double distanceFromMean = Math.abs((qualByDepth - averageQualByDepth)/ Math.sqrt(sampleVarianceOfQDs));
vcb.attribute(QD_DISTANCE_FROM_MEAN, distanceFromMean);
vcb.attribute(COPY_OF_QD_KEY_NAME, qualByDepth);

vcfWriter.add(vcb.make());
}

@Override
public void closeTool() {
if ( vcfWriter != null ) {
vcfWriter.close();
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package org.broadinstitute.hellbender.engine;

import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties;
import org.broadinstitute.hellbender.GATKBaseTest;
import org.broadinstitute.hellbender.cmdline.TestProgramGroup;
import org.testng.Assert;
import org.testng.annotations.Test;

public class TwoPassVariantWalkerUnitTest extends GATKBaseTest {
@CommandLineProgramProperties(
summary = "An example subclass of TwoPassVariantWalker",
oneLineSummary = "An example subclass of TwoPassVariantWalker",
programGroup = TestProgramGroup.class,
omitFromCommandLine = true
)
private static class DummyTwoPassVariantWalker extends TwoPassVariantWalker {
public int firstPass = 0;
public int secondPass = 0;
public boolean visitedAfterFirstPass = false;

@Override
protected void firstPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {
firstPass++;
}

@Override
protected void afterFirstPass(){
visitedAfterFirstPass = true;
}

@Override
protected void secondPassApply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {
secondPass++;
}
}

@Test
public void testTwoPassTraversal() {
final DummyTwoPassVariantWalker walker = new DummyTwoPassVariantWalker();
final String testVcf = "src/test/resources/org/broadinstitute/hellbender/tools/walkers/variantutils/VariantsToTable/multiallelic.vcf";

final String[] args = { "-V", testVcf };

walker.instanceMain(args);

final int expectedNumberOfVariantContexts = 52;
Assert.assertEquals(walker.firstPass, expectedNumberOfVariantContexts);
Assert.assertEquals(walker.secondPass, expectedNumberOfVariantContexts);
Assert.assertTrue(walker.visitedAfterFirstPass);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package org.broadinstitute.hellbender.tools.examples;

import htsjdk.variant.variantcontext.VariantContext;
import org.broadinstitute.hellbender.CommandLineProgramTest;
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
import org.broadinstitute.hellbender.engine.FeatureDataSource;
import org.broadinstitute.hellbender.utils.test.BaseTest;
import org.broadinstitute.hellbender.utils.variant.GATKVCFConstants;
import org.testng.Assert;
import org.testng.annotations.Test;

import java.io.File;
import java.util.Iterator;

public class ExampleTwoPassVariantWalkerIntegrationTest extends CommandLineProgramTest {
@Test
public void test() {
final File outputVcf = BaseTest.createTempFile("output", "vcf");
final String inputVcf = "src/test/resources/org/broadinstitute/hellbender/tools/walkers/variantutils/SelectVariants/haploid-multisample.vcf";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a good VCF to use for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the first vcf that had QD - most vcfs in the test suite don't


final String[] args = {
"-" + StandardArgumentDefinitions.VARIANT_SHORT_NAME, inputVcf,
"-" + StandardArgumentDefinitions.OUTPUT_SHORT_NAME, outputVcf.getAbsolutePath()
};

runCommandLine(args);

try (final FeatureDataSource<VariantContext> variantsBefore = new FeatureDataSource<>(inputVcf);
final FeatureDataSource<VariantContext> variantsAfter = new FeatureDataSource<>(outputVcf)){
final Iterator<VariantContext> beforeIterator = variantsBefore.iterator();
final Iterator<VariantContext> afterIterator = variantsAfter.iterator();

while (beforeIterator.hasNext() && afterIterator.hasNext()){
final VariantContext before = beforeIterator.next();
final VariantContext after = afterIterator.next();

Assert.assertEquals(before.getAttributeAsDouble(GATKVCFConstants.QUAL_BY_DEPTH_KEY, -1.0),
after.getAttributeAsDouble(ExampleTwoPassVariantWalker.COPY_OF_QD_KEY_NAME, -2.0));
Assert.assertTrue(after.getAttributeAsDouble(ExampleTwoPassVariantWalker.QD_DISTANCE_FROM_MEAN, -1.0) > 0);
}

Assert.assertTrue(! afterIterator.hasNext() && ! beforeIterator.hasNext());
}
}

}