Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent sequence dictionary validation when aligning reads #4308

Merged
merged 1 commit into from
May 21, 2018

Conversation

lbergelson
Copy link
Member

previously, tools that align reads required you to manually disable sequence dictionary validation
if you didn't, they would fail because the unaligned bam didn't have the required sequence dictionary

extracting out a SequenceDictionaryValidationArgumentCollection and providing a method for GATKSparkTools to configure it
ReadsPipeline couldn't easily make use of this, so instead it overrides the method that does validation

BwaSpark / BwaAndMarkDuplicatesPipelineSpark now do not require or allow dictionary validation
fixes #4131

@droazen droazen self-requested a review January 30, 2018 22:06
@droazen droazen self-assigned this Jan 30, 2018
@codecov-io
Copy link

codecov-io commented Jan 30, 2018

Codecov Report

Merging #4308 into master will increase coverage by 0.017%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #4308       +/-   ##
===============================================
+ Coverage     80.073%   80.091%   +0.017%     
- Complexity     17420     17437       +17     
===============================================
  Files           1080      1081        +1     
  Lines          63131     63201       +70     
  Branches       10200     10215       +15     
===============================================
+ Hits           50551     50618       +67     
  Misses          8587      8587               
- Partials        3993      3996        +3
Impacted Files Coverage Δ Complexity Δ
...k/pipelines/BwaAndMarkDuplicatesPipelineSpark.java 78.947% <100%> (+1.17%) 5 <1> (+1) ⬆️
...org/broadinstitute/hellbender/engine/GATKTool.java 91.388% <100%> (+0.083%) 94 <1> (+1) ⬆️
...nder/tools/spark/pipelines/ReadsPipelineSpark.java 89.796% <100%> (+0.665%) 14 <2> (+2) ⬆️
...institute/hellbender/tools/spark/bwa/BwaSpark.java 77.778% <100%> (+1.307%) 7 <1> (+1) ⬆️
...equenceDictionaryValidationArgumentCollection.java 100% <100%> (ø) 0 <0> (?)
...stitute/hellbender/engine/spark/GATKSparkTool.java 83.81% <100%> (+0.314%) 57 <6> (+1) ⬆️
...hellbender/tools/copynumber/GermlineCNVCaller.java 87.349% <0%> (+0.986%) 19% <0%> (+9%) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 80% <0%> (+30%) 3% <0%> (+2%) ⬆️

@droazen
Copy link
Contributor

droazen commented Jan 30, 2018

Before I review, can you move the new arg collection out of the spark package and into the usual cmdline.argumentcollections package?

@lbergelson
Copy link
Member Author

@droazen Good call.

@lbergelson lbergelson force-pushed the lb_disable_seqdict_validation_for_bwa branch from ac93a3b to 3195b02 Compare May 9, 2018 21:06
@lbergelson
Copy link
Member Author

@droazen I think this is ready for review

@droazen droazen requested review from cmnbroad and removed request for droazen May 14, 2018 19:14
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;

/**
* interface for argument collections that control
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmn, this trails off

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, very non committal...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@droazen droazen assigned cmnbroad and unassigned droazen May 14, 2018
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things, plus I think we should make the arg collection overridable in GATKTool as well as GATKSparkTool.

@@ -22,8 +23,8 @@
import java.util.List;

@DocumentedFeature
@CommandLineProgramProperties(summary = "Runs BWA",
oneLineSummary = "BWA on Spark",
@CommandLineProgramProperties(summary = "align reads using BWA",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align -> Align

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@CommandLineProgramProperties(summary = "Runs BWA",
oneLineSummary = "BWA on Spark",
@CommandLineProgramProperties(summary = "align reads using BWA",
oneLineSummary = "align reads to a given reference using BWA on Spark",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align > Align

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;

/**
* interface for argument collections that control
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, very non committal...

* most tools will want to use this, it defaults to performing sequence dictionary validation but provides the option
* to disable it
*/
class StandardValidationCollection implements SequenceDictionaryValidationArgumentCollection {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having these as inner classes so we don't have separate files for them. I assume that its the case that an inner class inside of an interface is static wrt the containing/implementing class ? Can/should these be static ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're static by default in an interface and intelli complains that it's redundnat if you mark them as static. I can add it in though if you think it's clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change anything. I just didn't recall seeing that before, but thats about what I suspected.

@Argument(fullName = StandardArgumentDefinitions.DISABLE_SEQUENCE_DICT_VALIDATION_NAME, shortName = StandardArgumentDefinitions.DISABLE_SEQUENCE_DICT_VALIDATION_NAME, doc = "If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk!", optional = true, common = true)
private boolean disableSequenceDictionaryValidation = false;
@ArgumentCollection
SequenceDictionaryValidationArgumentCollection seqValidationArguments = new SequenceDictionaryValidationArgumentCollection.StandardValidationCollection();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this delegate to a getSequenceDictionaryValidationArgumentCollection() method like the Spark one does ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, although we don't have any use case for it yet.

(knownSites.isEmpty() ? "": " --known-sites " + knownSites) +
" -O %s";
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was totally unused.

@lbergelson lbergelson force-pushed the lb_disable_seqdict_validation_for_bwa branch from 3195b02 to df35942 Compare May 14, 2018 20:11
@Argument(fullName = StandardArgumentDefinitions.DISABLE_SEQUENCE_DICT_VALIDATION_NAME, shortName = StandardArgumentDefinitions.DISABLE_SEQUENCE_DICT_VALIDATION_NAME, doc = "If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk!", optional = true, common = true)
private boolean disableSequenceDictionaryValidation = false;
@ArgumentCollection
SequenceDictionaryValidationArgumentCollection seqValidationArguments = getSequenceDictionaryValidationArgumentCollection();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be private.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed one access modifier thing, otherwise 👍 .

previously, tools that align reads required you to manually disable sequence dictionary validation
if you didn't, they would fail because the unaligned bam didn't have the required sequence dictionary

extracting out a SequenceDictionaryValidationArgumentCollection and providing a method for GATKSparkTools to configure it
ReadsPipeline couldn't easily make use of this, so instead it overrides the method that does validation

BwaSpark / BwaAndMarkDuplicatesPipelineSpark now do not require or allow dictionary validation
fixes #4131
@lbergelson lbergelson force-pushed the lb_disable_seqdict_validation_for_bwa branch from df35942 to 21322db Compare May 17, 2018 20:49
@lbergelson lbergelson assigned lbergelson and unassigned cmnbroad May 17, 2018
@lbergelson
Copy link
Member Author

made it private, will merge when tests pass, thanks @cmnbroad

@lbergelson lbergelson merged commit ec36b13 into master May 21, 2018
@lbergelson lbergelson deleted the lb_disable_seqdict_validation_for_bwa branch May 21, 2018 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tools that expect unaligned reads shouldn't validate the sequence dictionary
4 participants