-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenomicsDB matches CombineGVCFs with input spanning deletions #5397
GenomicsDB matches CombineGVCFs with input spanning deletions #5397
Conversation
@nalinigans - this should fix the tests with respect to the latest jar |
This release also errors out with a descriptive error message if the length of a field in the data lines does not match the length descriptor in the header - see #5045. |
Codecov Report
@@ Coverage Diff @@
## master #5397 +/- ##
==============================================
- Coverage 87.05% 80.394% -6.656%
+ Complexity 31480 29890 -1590
==============================================
Files 1923 1923
Lines 145148 145155 +7
Branches 16081 16082 +1
==============================================
- Hits 126352 116696 -9656
- Misses 12944 22806 +9862
+ Partials 5852 5653 -199
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
3a56003
to
8ad1991
Compare
final SimpleInterval queryInterval = new SimpleInterval(chr, start, end); | ||
if( !interval.equals(queryInterval)){ | ||
throw new GATKException("Cannot call query with different interval, expected:" + this.interval + " queried with: " + queryInterval); | ||
this.interval = new SimpleInterval(chr, start, end); | ||
this.query = reader.query(interval.getContig(), interval.getStart(), interval.getEnd()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droazen @lbergelson would it be possible for one of you take a look at this fix for #5300? I think my fix preserves the original intent of InitializedQueryWrapper which appears to be to speed up the opening of VCF readers. I didn't write this part of the code so I may not fully understand the objective.
8ad1991
to
e9f5da7
Compare
@kgururaj Could you rebase to resolve conflicts on this branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my comments @kgururaj -- back to you.
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImport.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/genomicsdb/GenomicsDBImport.java
Show resolved
Hide resolved
@kgururaj Pinging you on this one -- once the review comments above are addressed and the branch is rebased, I think this is good to go. |
Since @kgururaj is away, I'll address the remaining comments on this PR myself, since they are so minor. |
from earlier positions) as deletions in the min PL value computation. This behavior now matches the behavior of CombineGVCFs. A more detailed description of the issue is provided in #4963 * Deleted a couple of files which are no longer necessary. * Fixed the index of newMQcalc.combined.g.vcf
reader-threads are used in the importer. Not a race condition in GenomicsDB - InitializedQueryWrapper wasn't written for multiple intervals. CI test with multiple reader threads
e9f5da7
to
500b026
Compare
@nalinigans @kgururaj I addressed the review comments and rebased the branch, however after rebasing some of the GenomicsDB tests that assert an exact match against See the commit 500b026, where I've added comments to the failing tests Could one of you take a look at these failures and give your opinion? Is it possible that GenomicsDB no longer matches I found a note in the recent commit fb70191 (from the PR #5471) in which the author found it necessary to make separate copies of the |
Could also possibly be related to #5160? |
Hmm, that's strange, now only Error is "Attribute RAW_MQandDP expected [353426] but found [352585]". See https://storage.googleapis.com/hellbender-test-logs/build_reports/master_24137.4/tests/test/index.html |
@droazen, looks like the changes from PR #5540 caused the failure with testGenomicsDBImportFileInputs_newMQ. Reverting the changes from that pull request got things working again. I will look at this issue tomorrow, meanwhile any suggestions @ldgauthier? |
Got the tests to pass by generating new expected results for the following failing tests and use the GenomicsDBImports folder for expected results as that was updated as part of PR 5170. * testGenomicsDBImportFileInputsAgainstCombineGVCFWithNonDiploidData * testGenomicsDBImportFileInputs_newMQ Note : Merged with master.
Got the tests to pass by generating a new expected result as mentioned in @kgururaj comment for expected.testGenomicsDBImportWithNonDiploidData.vcf and by using the GenomicsDBImports folder for expected results as some of the vcf files were updated as part of PR 5170.
@droazen, please feel free to merge. Thanks. |
The newest release of GenomicsDB treats spanning deletions (spanning
from earlier positions) as deletions in the min PL value computation.
This behavior now matches the behavior of CombineGVCFs.
A more detailed description of the issue is provided in
#4963