-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse valid VCF 4.2 with ##INFO containing Source + Version #1248
Parse valid VCF 4.2 with ##INFO containing Source + Version #1248
Conversation
This allows us to validate a new feature in VCFv4.2 - optional tags in ##INFO field
Codecov Report
@@ Coverage Diff @@
## master #1248 +/- ##
===============================================
+ Coverage 69.324% 69.353% +0.029%
- Complexity 8287 8301 +14
===============================================
Files 555 555
Lines 33075 33116 +41
Branches 5561 5572 +11
===============================================
+ Hits 22929 22967 +38
- Misses 7884 7890 +6
+ Partials 2262 2259 -3
|
I will work on getting coverage back in shape |
@cmnbroad Could you take a look at this? It's one of the tickets that was slated to be closed by your giant refactor branch that never went int. |
@romanzenka Thanks for doing this! This warning might be too late, but the code coverage is really oversensitive and tends to fail at random due to some non-determinism in the test suite. So don't worry too much about pleasing it. |
I originally wanted to reuse parts of the original PR, but I could not figure out how it fixes this particular issue (VCF adding optional fields). So I enhanced the interface (retaining backwards compatibility) so now you can specify optional tags. |
Yes, the original PR refactored the entire header line class hierarchy, so it would be hard to extract just a piece of it. I'll take a look at the changes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many minor comments
src/test/java/htsjdk/variant/vcf/VCFHeaderLineTranslatorUnitTest.java
Outdated
Show resolved
Hide resolved
src/test/java/htsjdk/variant/vcf/VCFHeaderLineTranslatorUnitTest.java
Outdated
Show resolved
Hide resolved
Cleaner call avoiding code duplication Co-Authored-By: romanzenka <zenka.roman@mayo.edu>
Use diamond operator Co-Authored-By: romanzenka <zenka.roman@mayo.edu>
1) switched optional flags from Set to List, assuming there will be typically very few of them 2) comments on VCFLineParser, deprecated version without optional flags, removed the redundant 'public' 3) clearer exception in VCF4Parser when tags are out of order versus tags are completely unexpected 4) unit test checks for error message to match expected one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates the code looks 👍 to me, except for the issue with the case statement fall-through.
Will also need @lbergelson or @cmnbroad to check this off too, as I'm not a VCF format expert.
@romanzenka Thanks for taking this on - we've been wanting to fix it for a while. It looks you've made changes to make "source" and "version" tolerated on read, but the values aren't retained, accessible to callers, or round-tripped on write. Would you be able to add getters so these values can be retrieved from the header lines, and update the encoder to propagate them on write, with corresponding tests ? Ideally we'd implement generalized attribute support, but thats a lot more work. For now it would be ok to handle these two attributes specifically since they're recommended in the spec, but I'd be reluctant to take a change that accepts and then silently drops them. |
Will fail the tests for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the last changes - one minor minor change requested.
@lbergelson, except for one last comment on the tests, Source
and Version
can be round-tripped now. I think this PR strikes as good a balance as we're going to get for that without a major overhaul such as in my other PR. The only remaining question is whether we really want to deprecate the old parseLine
in VCFLineParser
, and the old constructor in VCFSimpleLineHeader
, or just leave them. I think they're harmless to leave, though the new replacements are preferable.
src/test/java/htsjdk/variant/vcf/VCFHeaderLineTranslatorUnitTest.java
Outdated
Show resolved
Hide resolved
I think I addressed everything - the only remaining issue was deprecation of those two constructors (that are not used anymore within the library). I removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One hopefully final request and we can get this merged.
src/test/java/htsjdk/variant/vcf/VCFHeaderLineTranslatorUnitTest.java
Outdated
Show resolved
Hide resolved
I added a freebie that you may/may not like - I am tired of the MD5CalculatingOutputStream spuriously messing up my coverage, so I made it deterministic. |
@romanzenka 👍 to fixing that. Don't worry too much about the coverage though... it's weirdly oversensitive. We need to change the settings to be more lenient. |
@lbergelson I do not think it is even sensitivity - there is something random about how the coverage gets compared. The latest report makes no sense to me - I have not done anything with those 21 reported files. I suspect the code coverage tool itself might be confused? |
Yeah, it choose funny comparisons sometimes and I'm not certain exactly what happens. I think it compares the coverage in the last run of master vs the coverage in the pr build, which isn't necessarily in sync although it should be close. |
…n/vcf_4.2_info_test
@lbergelson I am simply merging master for now to see what happens... my bad, that's probably best practice before accepting a PR anyway |
@cmnbroad Do you think this could be merged before Friday 2019-01-25? We are releasing an app and it would be nice to build it on top of an official commit instead of our own branch. |
src/test/java/htsjdk/samtools/util/Md5CalculatingOutputStreamTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergelson this looks good to me now (except for the old copyright date), in case you want a final review.
@romanzenka Thanks. Add yes, we should be able to get this merged in the next 24 hours. |
Thanks for working through this one. It turned out to be more work than I expected. I don't know if we'll be able to get another release out by friday. I'll see what I can do... |
Don't worry about it. Release when it is appropriate to release. We'll build off a particular commit. Oh and thanks for very thorough and high-quality review! |
Thanks again for working through all the issues @romanzenka. |
Description
As per bug #517 - despite VCFv4.2 spec allowing for "Source" and "Version" fields in ##INFO, htsjdk would throw an error. This fixes the issue.
Checklist