-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 pass variant walker #4744
2 pass variant walker #4744
Conversation
Hey @droazen will you review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be useful to have. Requested javadoc and unit tests.
|
||
import java.util.stream.StreamSupport; | ||
|
||
public abstract class TwoPassVariantWalker extends VariantWalker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - we can use this in the CNNScoreVariants tool (#4316). It will need class and method javadoc though, and unit tests (see the ones for TwoPassReadWalker).
// Second pass | ||
logger.info("Starting second pass through the variants"); | ||
traverseVariants(variantFilter, readFilter, this::secondPassApply); | ||
logger.info(readFilter.getSummaryLine()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This accumulates the filter count across both passes and displays the aggregate total. The TwoPassReadWalker does the same thing, though I wonder if thats a feature, given that the second pass is usually an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to Louis and James a bit about this. I don't think we need two different VariantFilters
, so I'll just call makeVariantFilter()
twice, in the hope that calling it the second time resets the counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete, back to @takutosato. This looks good, just needs some tests and javadoc.
|
||
import java.util.stream.StreamSupport; | ||
|
||
public abstract class TwoPassVariantWalker extends VariantWalker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tests, I think we need 3 things here:
- A
TwoPassVariantWalkerUnitTest
test class, modeled afterTwoPassReadWalkerUnitTest
- An
ExampleTwoPassVariantWalker
tool intools.examples
, modeled after the existingExampleVariantWalker
tool - A simple
ExampleTwoPassVariantWalkerIntegrationTest
, modeled afterExampleVariantWalkerIntegrationTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test for TwoPassVariantWalker
writes a private example walker class and runs it, which seems to be the same as the integration test. I think doing either is sufficient - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to have both the unit test and the integration test. The unit test uses a toy implementation to count calls to first/second pass apply, which is a good correctness test for the traversal. However, we also try to include a runnable example implementation that actually produces meaningful output for every walker type as a template for tool developers, and that's where ExampleTwoPassVariantWalker
comes in.
|
||
@Override | ||
public void traverse(){ | ||
final VariantFilter variantFilter = makeVariantFilter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file a github ticket saying that VariantWalkerBase
(and subclasses) need to migrate to CountingVariantFilter
, so that summary statistics for variant filtration can be printed at the end of traversal, as they are for read filters currently?
@cmnbroad Do you know why CountingVariantFilter
never got hooked up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we didn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
|
||
protected abstract void firstPassApply(final VariantContext variant, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add full javadoc for firstPassApply()
, secondPassApply()
, and afterFirstPass()
, modeled after the javadoc for the analogous methods in TwoPassReadWalker
Codecov Report
@@ Coverage Diff @@
## master #4744 +/- ##
===============================================
+ Coverage 79.985% 80.201% +0.215%
- Complexity 17419 18234 +815
===============================================
Files 1081 1086 +5
Lines 63118 66825 +3707
Branches 10195 11063 +868
===============================================
+ Hits 50485 53594 +3109
- Misses 8648 9073 +425
- Partials 3985 4158 +173
|
* @param variant A variant record in a vcf | ||
* @param readsContext Reads overlapping the current variant. Will be empty if a read source (e.g. bam) isn't provided | ||
* @param referenceContext Reference bases spanning the current variant | ||
* @param featureContext A vcf record overlapping the current variant from an auxiliary data source (e.g. gnomad) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
featureContext
supports more than just VCF records as auxiliary inputs -- any tribble-supported format can be accepted. I'd just change vcf record
to record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
/** | ||
* | ||
* First pass through the variants. The user may store data in instance variables of the walker as you go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you go
-> as they go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded
final VariantFilter variantContextFilter = makeVariantFilter(); | ||
final CountingReadFilter readFilter = makeReadFilter(); | ||
|
||
// First pass through the variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variant -> variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
final FeatureContext featureContext); | ||
|
||
/** | ||
* Process the data collected during the first pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explicit comment that this is called after firstPassApply()
and before secondPassApply()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Second pass | ||
logger.info("Starting second pass through the variants"); | ||
traverseVariants(variantContextFilter, readFilter, this::secondPassApply); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add logger.info(readFilter.getSummaryLine())
after the second traversal pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
import static org.broadinstitute.hellbender.utils.variant.GATKVCFConstants.QUAL_BY_DEPTH_KEY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid static imports -- they tend to create confusion when reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
@CommandLineProgramProperties( | ||
summary = "Example variant walker that makes two passes through a vcf", | ||
oneLineSummary = "Example two variant walker", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"two variant" -> "two-pass variant"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
private double sampleVarianceOfQDs; | ||
|
||
int counter = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
@Override | ||
protected void afterFirstPass() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example code would be more logical if you put afterFirstPass()
physically between firstPassApply()
and secondPassApply()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import java.util.List; | ||
|
||
import static org.broadinstitute.hellbender.utils.variant.GATKVCFConstants.QUAL_BY_DEPTH_KEY; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document in a 1-2 sentence comment what the example tool does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
import static org.broadinstitute.hellbender.tools.examples.ExampleTwoPassVariantWalker.COPY_OF_QD_KEY_NAME; | ||
import static org.broadinstitute.hellbender.tools.examples.ExampleTwoPassVariantWalker.QD_DISTANCE_FROM_MEAN; | ||
import static org.broadinstitute.hellbender.utils.variant.GATKVCFConstants.QUAL_BY_DEPTH_KEY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid static imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public class ExampleTwoPassVariantWalkerIntegrationTest extends CommandLineProgramTest { | ||
@Test | ||
public void test() throws IOException { | ||
final File outputVcf = File.createTempFile("output", "vcf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use BaseTest.createTempFile()
instead, as it schedules the file (+ companion indices) for deletion on JVM exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
runCommandLine(args); | ||
|
||
final FeatureDataSource<VariantContext> variantsBefore = new FeatureDataSource<>(inputVcf); | ||
final FeatureDataSource<VariantContext> variantsAfter = new FeatureDataSource<>(outputVcf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create these data sources in a try-with-resources block, to ensure that they get closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Test | ||
public void test() throws IOException { | ||
final File outputVcf = File.createTempFile("output", "vcf"); | ||
final String inputVcf = "src/test/resources/org/broadinstitute/hellbender/tools/walkers/variantutils/SelectVariants/haploid-multisample.vcf"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a good VCF to use for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the first vcf that had QD - most vcfs in the test suite don't
@CommandLineProgramProperties( | ||
summary = "An example subclass of TwoPassVariantWalker", | ||
oneLineSummary = "An example subclass of TwoPassVariantWalker", | ||
programGroup = TestProgramGroup.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omitFromCommandLine = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
@Test | ||
public void testNum() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testNum
-> testTwoPassTraversal()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import org.testng.Assert; | ||
import org.testng.annotations.Test; | ||
|
||
public class TwoPassVariantWalkerUnitTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extends GATKBaseTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second-pass review complete, back to @takutosato with more comments.
@droazen thanks for the review. back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final review complete, back to @takutosato. Two last quick comments to address, then go ahead and hit merge after tests pass.
|
||
private VariantContextWriter vcfWriter; | ||
|
||
static String QD_DISTANCE_FROM_MEAN = "QD_DIST"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a constant this should be static final
, not just static
|
||
static String QD_DISTANCE_FROM_MEAN = "QD_DIST"; | ||
|
||
static String COPY_OF_QD_KEY_NAME = "QD_COPY"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be static final
I pushed the final requested changes myself just now -- will merge once tests pass. |
Great, thanks @droazen! |
No description provided.