Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomicsDB matches CombineGVCFs with input spanning deletions #5397

Merged
merged 6 commits into from
Jan 16, 2019

Conversation

kgururaj
Copy link
Collaborator

@kgururaj kgururaj commented Nov 8, 2018

The newest release of GenomicsDB treats spanning deletions (spanning
from earlier positions) as deletions in the min PL value computation.
This behavior now matches the behavior of CombineGVCFs.

A more detailed description of the issue is provided in
#4963

@kgururaj
Copy link
Collaborator Author

kgururaj commented Nov 8, 2018

@nalinigans - this should fix the tests with respect to the latest jar

@kgururaj
Copy link
Collaborator Author

kgururaj commented Nov 8, 2018

This release also errors out with a descriptive error message if the length of a field in the data lines does not match the length descriptor in the header - see #5045.
Error out behavior as per Laura's comment here

@codecov-io
Copy link

codecov-io commented Nov 8, 2018

Codecov Report

Merging #5397 into master will decrease coverage by 6.656%.
The diff coverage is 76.923%.

@@              Coverage Diff               @@
##             master     #5397       +/-   ##
==============================================
- Coverage     87.05%   80.394%   -6.656%     
+ Complexity    31480     29890     -1590     
==============================================
  Files          1923      1923               
  Lines        145148    145155        +7     
  Branches      16081     16082        +1     
==============================================
- Hits         126352    116696     -9656     
- Misses        12944     22806     +9862     
+ Partials       5852      5653      -199
Impacted Files Coverage Δ Complexity Δ
...ls/genomicsdb/GenomicsDBImportIntegrationTest.java 88.47% <100%> (+0.103%) 82 <2> (+2) ⬆️
.../hellbender/tools/genomicsdb/GenomicsDBImport.java 78.195% <50%> (-0.892%) 53 <0> (ø)
...rs/variantutils/SelectVariantsIntegrationTest.java 0.255% <0%> (-99.745%) 1% <0%> (-70%)
...kers/filters/VariantFiltrationIntegrationTest.java 0.826% <0%> (-99.174%) 1% <0%> (-25%)
...dorientation/CollectF1R2CountsIntegrationTest.java 0.917% <0%> (-99.083%) 1% <0%> (-12%)
.../walkers/bqsr/BaseRecalibratorIntegrationTest.java 1.031% <0%> (-98.969%) 1% <0%> (-7%)
...ers/vqsr/FilterVariantTranchesIntegrationTest.java 1.053% <0%> (-98.947%) 1% <0%> (-5%)
...s/variantutils/VariantsToTableIntegrationTest.java 1.205% <0%> (-98.795%) 1% <0%> (-20%)
...walkers/validation/ConcordanceIntegrationTest.java 1.563% <0%> (-98.438%) 1% <0%> (-5%)
...ientation/ReadOrientationModelIntegrationTest.java 1.667% <0%> (-98.333%) 1% <0%> (-5%)
... and 154 more

Copy link
Collaborator

@nalinigans nalinigans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@kgururaj kgururaj force-pushed the genomicsdb_combine_gvcfs_spanning_deletions_match branch from 3a56003 to 8ad1991 Compare November 16, 2018 19:27
final SimpleInterval queryInterval = new SimpleInterval(chr, start, end);
if( !interval.equals(queryInterval)){
throw new GATKException("Cannot call query with different interval, expected:" + this.interval + " queried with: " + queryInterval);
this.interval = new SimpleInterval(chr, start, end);
this.query = reader.query(interval.getContig(), interval.getStart(), interval.getEnd());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droazen @lbergelson would it be possible for one of you take a look at this fix for #5300? I think my fix preserves the original intent of InitializedQueryWrapper which appears to be to speed up the opening of VCF readers. I didn't write this part of the code so I may not fully understand the objective.

@kgururaj kgururaj force-pushed the genomicsdb_combine_gvcfs_spanning_deletions_match branch from 8ad1991 to e9f5da7 Compare November 19, 2018 18:25
@droazen
Copy link
Contributor

droazen commented Dec 10, 2018

@kgururaj Could you rebase to resolve conflicts on this branch?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my comments @kgururaj -- back to you.

@droazen
Copy link
Contributor

droazen commented Jan 3, 2019

@kgururaj Pinging you on this one -- once the review comments above are addressed and the branch is rebased, I think this is good to go.

@droazen
Copy link
Contributor

droazen commented Jan 7, 2019

Since @kgururaj is away, I'll address the remaining comments on this PR myself, since they are so minor.

@droazen droazen assigned droazen and unassigned kgururaj Jan 7, 2019
kgururaj and others added 4 commits January 8, 2019 15:29
from earlier positions) as deletions in the min PL value computation.
This behavior now matches the behavior of CombineGVCFs.

A more detailed description of the issue is provided in
#4963

* Deleted a couple of files which are no longer necessary.
* Fixed the index of newMQcalc.combined.g.vcf
reader-threads are used in the importer. Not a race condition in
GenomicsDB - InitializedQueryWrapper wasn't written for multiple
intervals.

CI test with multiple reader threads
@droazen droazen force-pushed the genomicsdb_combine_gvcfs_spanning_deletions_match branch from e9f5da7 to 500b026 Compare January 9, 2019 18:03
@droazen
Copy link
Contributor

droazen commented Jan 9, 2019

@nalinigans @kgururaj I addressed the review comments and rebased the branch, however after rebasing some of the GenomicsDB tests that assert an exact match against CombineGVCFs output are failing, despite my regenerating the expected output using the latest CombineGVCFs.

See the commit 500b026, where I've added comments to the failing tests testGenomicsDBImportFileInputs_newMQ() and testGenomicsDBImportFileInputsAgainstCombineGVCFWithNonDiploidData()

Could one of you take a look at these failures and give your opinion? Is it possible that GenomicsDB no longer matches CombineGVCFs in these cases?

I found a note in the recent commit fb70191 (from the PR #5471) in which the author found it necessary to make separate copies of the HaplotypeCaller expected GVCFs in order to get the GenomicsDB tests to pass. This is likely related to the test failures we're seeing now.

@droazen
Copy link
Contributor

droazen commented Jan 9, 2019

Could also possibly be related to #5160?

@droazen
Copy link
Contributor

droazen commented Jan 9, 2019

Hmm, that's strange, now only testGenomicsDBImportFileInputs_newMQ() is failing. testGenomicsDBImportFileInputsAgainstCombineGVCFWithNonDiploidData(), which was failing for me this morning, is now passing despite no additional changes.

Error is "Attribute RAW_MQandDP expected [353426] but found [352585]". See https://storage.googleapis.com/hellbender-test-logs/build_reports/master_24137.4/tests/test/index.html

@nalinigans
Copy link
Collaborator

nalinigans commented Jan 14, 2019

@droazen, looks like the changes from PR #5540 caused the failure with testGenomicsDBImportFileInputs_newMQ. Reverting the changes from that pull request got things working again. I will look at this issue tomorrow, meanwhile any suggestions @ldgauthier?

Got the tests to pass by generating new expected results for the following failing tests and use the GenomicsDBImports folder for expected results as that was updated as part of PR 5170.

* testGenomicsDBImportFileInputsAgainstCombineGVCFWithNonDiploidData
* testGenomicsDBImportFileInputs_newMQ

Note : Merged with master.
@nalinigans
Copy link
Collaborator

nalinigans commented Jan 16, 2019

Got the tests to pass by generating a new expected result as mentioned in @kgururaj comment for expected.testGenomicsDBImportWithNonDiploidData.vcf and by using the GenomicsDBImports folder for expected results as some of the vcf files were updated as part of PR 5170.

  • testGenomicsDBImportFileInputsAgainstCombineGVCFWithNonDiploidData
  • testGenomicsDBImportFileInputs_newMQ

@droazen, please feel free to merge. Thanks.

@droazen droazen merged commit 05aa6b2 into master Jan 16, 2019
@droazen droazen deleted the genomicsdb_combine_gvcfs_spanning_deletions_match branch January 16, 2019 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants