-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VQSR serialized model sets annotation order #3655
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3655 +/- ##
=============================================
- Coverage 86.657% 79.9% -6.756%
+ Complexity 29049 17191 -11858
=============================================
Files 1808 1067 -741
Lines 134688 62513 -72175
Branches 14938 10166 -4772
=============================================
- Hits 116716 49948 -66768
+ Misses 12559 8609 -3950
+ Partials 5413 3956 -1457
|
@@ -56,7 +56,7 @@ public void setNormalization(final Map<String, Double> anMeans, final Map<String | |||
return data; | |||
} | |||
|
|||
public void normalizeData(final boolean calculateMeans) { | |||
public void normalizeData(final boolean calculateMeans, List<Integer> theOrder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -487,6 +486,25 @@ else if (null != sequenceDictionary) { | |||
} | |||
} | |||
|
|||
|
|||
private void orderAndValidateAnnotations(final GATKReportTable annotationTable){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add an integration test that fails before this fix, and passes after it? Also should add a basic unit test for the new orderAndValidateAnnotations()
method.
for (int i = 0; i < annotationTable.getNumRows(); i++){ | ||
String serialAnno = (String)annotationTable.get(i, "Annotation"); | ||
for (int j = 0; j < dataManager.annotationKeys.size(); j++) { | ||
if (serialAnno.equals( dataManager.annotationKeys.get(j))){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a hash lookup against the dataManager
to avoid an n^2 algorithm? (might not matter if these are small, in which case never mind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was being lazy, but I've never seen us use more than 7 annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested some tests
Added unit and integration tests. Had to make a few things protected for testing and added a vcf with a single SNP (where @ldgauthier found this bug) for the integration test. Back to you @droazen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #3475, conflicts are expected in the test classes, because they are now extending GATKBaseTest
- a rebase should solve most of the problems without need of change of the code (except comments below)
@@ -1,7 +1,9 @@ | |||
package org.broadinstitute.hellbender.tools.walkers.vqsr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #3475, this should extend GATKBaseTest
@lucidtronix here's a blast from the past. You're going to have to rebase and I hope you don't have too many merge conflicts. @droazen while you're doing your Monday issue review, can you refresh your memory on this one? |
6a21375
to
a15ec43
Compare
@droazen back to you |
e22d867
to
9944059
Compare
@droazen bump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the integration test you added fails before this fix, I am happy (modulo a trivial comment below on a stray import statement).
If @ldgauthier is also happy with this PR you can go ahead and merge when ready.
@@ -1,7 +1,9 @@ | |||
package org.broadinstitute.hellbender.tools.walkers.vqsr; | |||
|
|||
import avro.shaded.com.google.common.collect.Lists; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want com.google.common.collect.Lists
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember being happy with this back in October. I gave Sam the data for the integration test because it was being handled incorrectly. There's also the GATKBaseTest comment to address, but that should be easy peasy.
This addresses the problem where serialized GMMs for VQSR assumed the annotation order would be the same between the commands that generated them and the commands that used them. VQSR no longer depends on the commandline order of the annotations.