Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VQSR serialized model sets annotation order #3655

Merged
merged 3 commits into from
Aug 27, 2018

Conversation

lucidtronix
Copy link
Contributor

This addresses the problem where serialized GMMs for VQSR assumed the annotation order would be the same between the commands that generated them and the commands that used them. VQSR no longer depends on the commandline order of the annotations.

@codecov-io
Copy link

codecov-io commented Oct 3, 2017

Codecov Report

Merging #3655 into master will decrease coverage by 6.756%.
The diff coverage is 100%.

@@              Coverage Diff              @@
##              master   #3655       +/-   ##
=============================================
- Coverage     86.657%   79.9%   -6.756%     
+ Complexity     29049   17191    -11858     
=============================================
  Files           1808    1067      -741     
  Lines         134688   62513    -72175     
  Branches       14938   10166     -4772     
=============================================
- Hits          116716   49948    -66768     
+ Misses         12559    8609     -3950     
+ Partials        5413    3956     -1457
Impacted Files Coverage Δ Complexity Δ
...lbender/tools/walkers/vqsr/VariantDataManager.java 69.068% <100%> (+0.983%) 89 <0> (+6) ⬆️
...bender/tools/walkers/vqsr/VariantRecalibrator.java 60.57% <100%> (+1.347%) 65 <6> (+7) ⬆️
...nder/engine/datasources/ReferenceHadoopSource.java 0% <0%> (-100%) 0% <0%> (-3%)
.../tools/spark/sv/discovery/BreakEndVariantType.java 0% <0%> (-92.381%) 0% <0%> (-14%)
...pleNovelAdjacencyAndChimericAlignmentEvidence.java 24.324% <0%> (-63.176%) 5% <0%> (-5%)
...walkers/genotyper/GenotypingGivenAllelesUtils.java 28.571% <0%> (-32.967%) 2% <0%> (ø)
...hellbender/tools/spark/pipelines/SortSamSpark.java 70.588% <0%> (-29.412%) 4% <0%> (-2%)
...ignment/AssemblyContigWithFineTunedAlignments.java 70% <0%> (-22.248%) 33% <0%> (-26%)
...itute/hellbender/tools/funcotator/Funcotation.java 33.333% <0%> (-22.222%) 3% <0%> (-2%)
...decs/xsvLocatableTable/XsvLocatableTableCodec.java 63.492% <0%> (-19.224%) 14% <0%> (-46%)
... and 1025 more

@droazen droazen self-assigned this Oct 16, 2017
@@ -56,7 +56,7 @@ public void setNormalization(final Map<String, Double> anMeans, final Map<String
return data;
}

public void normalizeData(final boolean calculateMeans) {
public void normalizeData(final boolean calculateMeans, List<Integer> theOrder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs javadoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -487,6 +486,25 @@ else if (null != sequenceDictionary) {
}
}


private void orderAndValidateAnnotations(final GATKReportTable annotationTable){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add an integration test that fails before this fix, and passes after it? Also should add a basic unit test for the new orderAndValidateAnnotations() method.

for (int i = 0; i < annotationTable.getNumRows(); i++){
String serialAnno = (String)annotationTable.get(i, "Annotation");
for (int j = 0; j < dataManager.annotationKeys.size(); j++) {
if (serialAnno.equals( dataManager.annotationKeys.get(j))){
Copy link
Contributor

@droazen droazen Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a hash lookup against the dataManager to avoid an n^2 algorithm? (might not matter if these are small, in which case never mind)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was being lazy, but I've never seen us use more than 7 annotations.

@droazen droazen assigned lucidtronix and unassigned droazen Oct 16, 2017
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested some tests

@lucidtronix
Copy link
Contributor Author

Added unit and integration tests. Had to make a few things protected for testing and added a vcf with a single SNP (where @ldgauthier found this bug) for the integration test. Back to you @droazen.

Copy link
Contributor

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #3475, conflicts are expected in the test classes, because they are now extending GATKBaseTest - a rebase should solve most of the problems without need of change of the code (except comments below)

@@ -1,7 +1,9 @@
package org.broadinstitute.hellbender.tools.walkers.vqsr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #3475, this should extend GATKBaseTest

@ldgauthier
Copy link
Contributor

@lucidtronix here's a blast from the past. You're going to have to rebase and I hope you don't have too many merge conflicts.

@droazen while you're doing your Monday issue review, can you refresh your memory on this one?

@lucidtronix lucidtronix force-pushed the sf_vqsr_annotation_order branch from 6a21375 to a15ec43 Compare April 2, 2018 23:36
@lucidtronix
Copy link
Contributor Author

@droazen back to you

@lucidtronix lucidtronix force-pushed the sf_vqsr_annotation_order branch from e22d867 to 9944059 Compare August 23, 2018 20:17
@lucidtronix
Copy link
Contributor Author

@droazen bump

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the integration test you added fails before this fix, I am happy (modulo a trivial comment below on a stray import statement).

If @ldgauthier is also happy with this PR you can go ahead and merge when ready.

@@ -1,7 +1,9 @@
package org.broadinstitute.hellbender.tools.walkers.vqsr;

import avro.shaded.com.google.common.collect.Lists;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want com.google.common.collect.Lists here.

@droazen droazen requested a review from ldgauthier August 24, 2018 17:54
Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember being happy with this back in October. I gave Sam the data for the integration test because it was being handled incorrectly. There's also the GATKBaseTest comment to address, but that should be easy peasy.

@lucidtronix lucidtronix merged commit d2e2580 into master Aug 27, 2018
@lucidtronix lucidtronix deleted the sf_vqsr_annotation_order branch August 27, 2018 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants