-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an IntervalCodec that use useful for sorting large sets of #1288
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1288 +/- ##
===============================================
+ Coverage 67.495% 67.587% +0.092%
- Complexity 8150 8178 +28
===============================================
Files 558 561 +3
Lines 33364 33409 +45
Branches 5608 5611 +3
===============================================
+ Hits 22519 22580 +61
+ Misses 8657 8641 -16
Partials 2188 2188
|
I may need to also make a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 Thank you, this seems useful. I have two very minor comments and then it's good to merge 👍
This may be useful when writing intervals to a file when storing all the intervals in memory is not possible.
38fa596
to
bd872b5
Compare
@lbergelson commit bd872b5 is needed for this PR to be fully useful for the PR in Picard. Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 A few more comments on the new changes. 👍 when they're resolved.
|
||
// Write out the intervals | ||
try { | ||
final IntervalListWriter writer = new IntervalListWriter(file, this.header); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common refrain, why not use a try-with-resources to autoclose this thing?
/** Creates a new writer, writing a header to the file. | ||
* @param file a file to write to. If exists it will be overwritten. | ||
*/ | ||
public IntervalListWriter(final File file) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now top new level code can you convert it use Path
instead of File
.
public class IntervalListWriter implements Closeable { | ||
|
||
private final BufferedWriter out; | ||
private final FormatUtil format = new FormatUtil(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just moving existing code, but this use of FormatUtil seems unnecessary. Wouldn't Integer.toString() serve the same purpose with less overhead?
|
||
final IntervalList actualList = IntervalList.fromFile(tempFile); | ||
|
||
Assert.assertEquals( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should assert that the header is correct as well.
return retval; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an extra }
. It's not compiling.
e34d33f
to
7bbe157
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nh13 Thanks. Going to merge this and then follow up with a pr that fixes the path issue.
public IntervalListWriter(final File file, final SAMFileHeader header) { | ||
out = IOUtil.openFileForBufferedWriting(file); | ||
public IntervalListWriter(final Path path, final SAMFileHeader header) { | ||
out = IOUtil.openFileForBufferedWriting(path.toFile()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intervals.
Description
This will be useful for using with a
SortingCollection
when we are sorting over MANY intervals. My intention is to submit a PR to Picard to use this codec andSortingCollection
when lifting over an interval list, as I sometimes run out of memory withLiftoverIntervalList
even with 32GB of memory.Checklist