-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize some Mutect-related tools #5073
Conversation
@@ -77,7 +77,7 @@ private void initializeTruthVariantsIfNecessary() { | |||
} | |||
|
|||
if (truthVariants == null) { | |||
truthVariants = new FeatureDataSource<>(new FeatureInput<>(truthVariantsFile, "truth"), CACHE_LOOKAHEAD, VariantContext.class); | |||
truthVariants = new FeatureDataSource<>(new FeatureInput<>(truthVariantsFile, "truth"), CACHE_LOOKAHEAD, VariantContext.class, cloudPrefetchBuffer, cloudIndexPrefetchBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidbenjamin You should make this same change to the evalVariants
FeatureDataSource
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmnbroad, did it.
d6da091
to
518b9b7
Compare
Codecov Report
@@ Coverage Diff @@
## master #5073 +/- ##
===============================================
+ Coverage 86.385% 86.419% +0.034%
- Complexity 28822 29084 +262
===============================================
Files 1791 1791
Lines 133561 134336 +775
Branches 14902 15138 +236
===============================================
+ Hits 115377 116092 +715
- Misses 12791 12808 +17
- Partials 5393 5436 +43
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidbenjamin One minor comment.
@@ -55,14 +53,15 @@ | |||
* gatk GetPileupSummaries \ | |||
* -I tumor.bam \ | |||
* -V common_biallelic.vcf.gz \ | |||
* -V common_biallelic.vcf.gz \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why -V twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops -- one is -V
and one should be -L
. I have to admit this is kind of clumsy but it's not utterly redundant, at least. For example, one might want to use some custom interval list (-L
) of common sites but grab the allele frequencies from gnomAD (-V
).
Fixed it and expanded the javadoc a bit.
@LeeTL1220 back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidbenjamin One more doc comment, then feel free to merge.
@@ -65,6 +65,17 @@ | |||
* -O pileups.table | |||
* </pre> | |||
* | |||
* Although the sites (-L) and variants (-V) resources will often be identical, this need not be the case. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically these files would only be different if you were trying to use a subset of the variants? I.e. -L was a subset of -V? If so, can you add that to the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
5f82c7e
to
351582c
Compare
@LeeTL1220 The first commit makes all concordance tools and MC3/M2 vcf merge much faster in Firecloud -- tasks that have been taking an hour will take a few minutes. The second commit makes
GetPileupSummaries
, hence the contamination task much faster. Previously that tool has cached all reads for the 100,000 bases around each site, so basically it had to do a whole bam's worth of I/O just to get ~60,000 pileups.