Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed error in FilterIntervals logic when filtering on annotations with -XL. #7046

Merged
merged 1 commit into from
Jan 21, 2021

Conversation

samuelklee
Copy link
Contributor

Fixes a bug that I inadvertently introduced long ago in #5408. A method there originally assumed that the intervals to be filtered coincide with the annotated intervals when filtering on annotations, but this assumption can be violated when e.g. using -XL to further exclude intervals. This breaks the annotation-based filtering and will result in incorrectly retained/dropped intervals. Unfortunately, the integration test cases were not quite complete enough to catch this.

Fortunately, for typical data and parameters, the number of affected intervals is typically a very small percentage, especially in human (e.g., the default GC filters of [0.1, 0.9] affect <0.1% of 250bp bins); I only caught this when running on malaria data, since GC filtering has more impact there. I would also expect count-based filters to mitigate some of the effects (e.g., a bin with extreme GC that was erroneously retained when filtering on annotations might later be filtered because it has poor coverage). Downstream results are also all correct---they're just given for a slightly different set of intervals than would be expected.

This bug would affect users that made use of the blacklist_intervals_for_filter_intervals option in the gCNV WDLs, but my feeling is this functionality is not used that frequently.

@droazen I think this is a relatively minor bug, but it might be good to mention it in the release notes.

@gatk-bot
Copy link

gatk-bot commented Jan 21, 2021

Travis reported job failures from build 32589
Failures in the following jobs:

Test Type JDK Job ID Logs
integration openjdk11 32589.12 logs
integration openjdk8 32589.2 logs

@samuelklee samuelklee force-pushed the sl_fix_filter_intervals_indexing branch from 886106d to f4e0d1a Compare January 21, 2021 16:47
@samuelklee samuelklee force-pushed the sl_fix_filter_intervals_indexing branch from f4e0d1a to bb2044b Compare January 21, 2021 16:48
@samuelklee samuelklee requested a review from mwalker174 January 21, 2021 16:49
@samuelklee
Copy link
Contributor Author

@mwalker174 quick one for you. Tried to do a bit of clean up in the parsing of optional inputs; there is still some duplication of code, but I think I'm OK with it. Sorry this went undetected for so long, but maybe that's a good thing!

Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I don't believe this would have had a large impact on WGS. One suggestion, but otherwise looks good!

@samuelklee
Copy link
Contributor Author

Thanks @mwalker174, will merge once tests pass.

@samuelklee samuelklee merged commit a9230ee into master Jan 21, 2021
@samuelklee samuelklee deleted the sl_fix_filter_intervals_indexing branch January 21, 2021 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants