Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM queryAlignmentStart/queryMate fix. #1164

Merged
merged 2 commits into from
Jun 25, 2019

Conversation

cmnbroad
Copy link
Collaborator

Fixes an issue discovered as part of debugging #1065. The CRAM implementation of queryAlignmentStart (which is in turn used by queryMate) currently returns all reads in a container that start after the requested alignment start. This PR pulls out the BAMStartingAtFilterIterator into a top-level class so it can be reused by the CRAM implementation to filter, and then stop iterating after, all reads with the desired start have been returned.

@codecov-io
Copy link

codecov-io commented Sep 6, 2018

Codecov Report

Merging #1164 into master will increase coverage by 0.016%.
The diff coverage is 87.879%.

@@               Coverage Diff               @@
##              master     #1164       +/-   ##
===============================================
+ Coverage     67.863%   67.879%   +0.016%     
- Complexity      8289      8294        +5     
===============================================
  Files            563       564        +1     
  Lines          33718     33729       +11     
  Branches        5659      5659               
===============================================
+ Hits           22882     22895       +13     
+ Misses          8656      8655        -1     
+ Partials        2180      2179        -1
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/BAMFileReader.java 68.466% <ø> (-0.199%) 52 <0> (ø)
...a/htsjdk/samtools/BAMStartingAtIteratorFilter.java 86.667% <86.667%> (ø) 5 <5> (?)
src/main/java/htsjdk/samtools/CRAMFileReader.java 77.295% <88.889%> (+1.274%) 53 <0> (ø) ⬇️


/**
* Subclasses must call this method in their constructors AFTER construction of this class is complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you not have a "abstract callInitialize()" method in the base class and then call that method from the base constructor? that would force an subclass to implement this method and read this comment...otherwise I'm not sure it will be noticed....

Copy link
Contributor

@pshapiro4broad pshapiro4broad Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is error prone but at least it's a private class, so it's likely a future subclasser will read this comment before writing code.

I don't see why this class still has CRAMIntervalIteratorBase(final QueryInterval[] queries, final boolean contained, final long[] coordinates) as coordinates is ignored. Removing that constructor will also be a way to tell callers that they must call this method, which you could rename to something like setCoordinates().

Copy link
Collaborator Author

@cmnbroad cmnbroad May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that constructor appears to be redundant. Removed. Not sure I understand how that helps with the requirement to call initialize though (or perhaps I misunderstand what the suggestion is). This is a private class though, and this whole containing class is targeted for a large refactoring as part of the cram overhaul I'm starting.

return FilteringIteratorState.CONTINUE_ITERATION;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra nl (from previous code)

@@ -0,0 +1,43 @@
package htsjdk.samtools;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not htsjdk.samtools.filter ?

Copy link
Collaborator Author

@cmnbroad cmnbroad May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this seems to have more affinity with BAMIteratorFilter and it's other subclass BAMQueryMultipleIntervalsIteratorFilter, which are all in this package. Happy to move it if you still think its better there though.

@@ -492,21 +494,30 @@ void enableFileSource(final SamReader reader, final boolean enabled) {
return BAMFileSpan.merge(spanArray).toCoordinateArray();
}

private class CRAMIntervalIterator extends BAMQueryMultipleIntervalsIteratorFilter
private abstract class CRAMIntervalIteratorBase extends BAMQueryMultipleIntervalsIteratorFilter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to note here the reason for the abstractness, not only in line 513....or better, actually have an abstract method whose implementation will resolve the issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added. Not sure I see understand how the suggestion to add another abstract method resolves the initialize issue though. Can you give a more specific example ?

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! work on CRAM is happening!! hurray!

case MATCHES_FILTER:
nextRec = nextRecord;
break;
case CONTINUE_ITERATION:
continue;
case STOP_ITERATION:
break;
nextRec = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances can you reach this line and have nextRec not be null? It's not modified outside of this method. Making it a private field would make this a little clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the assignment seems to be redundant. fixed.

@@ -51,43 +51,43 @@
@Test
public void testConstructors () throws IOException {
CRAMFileReader reader = new CRAMFileReader(cramFile, indexFile, source, ValidationStringency.SILENT);
CloseableIterator<SAMRecord> iterator = reader.queryAlignmentStart("chrM", 1500);
CloseableIterator<SAMRecord> iterator = reader.queryAlignmentStart("chrM", 1519);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a named constant instead of a magic number would avoid the need to modify it in many places the next time this changes. And also make it more apparent that the same value is used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, including using a constant for the sequence name.

Assert.assertTrue(iterator.hasNext());
SAMRecord record = iterator.next();

Assert.assertEquals(record.getReferenceName(), "chrM");
Assert.assertTrue(record.getAlignmentStart() >= 1500);
Assert.assertEquals(record.getAlignmentStart(),1519);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space after , on many lines here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final String queryContig,
final int alignmentStart,
final int expectedReadCount) throws IOException
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open brace should be at end of previous line, not on its own line

@yfarjoun yfarjoun added cram Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond labels Sep 10, 2018
@cmnbroad cmnbroad force-pushed the cn_cram_query_mate branch 2 times, most recently from 23807f1 to ef537e8 Compare May 10, 2019 13:46
@cmnbroad cmnbroad added 1 - Ready Waiting for Review This PR is waiting for a reviewer to respond and removed Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond 1 - Ready labels May 10, 2019
@cmnbroad cmnbroad mentioned this pull request Jun 11, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cram Waiting for Review This PR is waiting for a reviewer to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants