Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BQSR filters out reads 'not in target region' #46

Closed
pcostanza opened this issue Mar 15, 2021 · 2 comments
Closed

BQSR filters out reads 'not in target region' #46

pcostanza opened this issue Mar 15, 2021 · 2 comments

Comments

@pcostanza
Copy link
Contributor

I noticed in the logs that when using the BQSR option, reads not in target region are filtered out. In our case this is unwanted behavior, since we intend to use the output bam as our only archived copy of the data. As such we would like to keep a maximum amount of data.
The read removal doesn't seem to happen when omitting BQSR, so I'm not sure if this is a feature or a bug.

Matthias

Originally posted by @matthdsm in #44 (comment)

@pcostanza
Copy link
Contributor Author

@matthdsm This is actually a "feature" in the sense that it reproduces exactly the same behavior as in GATK: When you pass the -L option to ApplyBQSR, it also produces a BAM file that filters out reads that are not in the specified regions. This also has an impact on HaplotypeCaller: When you pass the -L option to HaplotypeCaller, it will work only on the specified regions, but adds some padding around those. If we wouldn't filter out reads in our BQSR step, our HaplotypeCaller would therefore take more reads into account than the original HaplotypeCaller. (GATK's ApplyBQSR with -L option effectively cancels out the padding in GATK's HaplotypeCaller with -L option.)

If you have a better idea how we should handle this case while remaining compatible, we would be very interested to hear suggestions. :)

Pascal

@matthdsm
Copy link
Contributor

Hi Pascal,

Thanks for the clarification.
I get the need for feature parity between Elprep and the GATK, it's a big selling point. With this explanation, I think it's handled as it should be.

Thanks for clearing up, feel free to close.
Cheers
Matthias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants