Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should we be filtering for mapping quality? #110

Closed
jamestwebber opened this issue Sep 9, 2023 · 4 comments
Closed

should we be filtering for mapping quality? #110

jamestwebber opened this issue Sep 9, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@jamestwebber
Copy link
Collaborator

Hi Andrey 👋

We've been running IsoQuant a lot lately. This week I was looking at some of our data in IGV and noticed a discrepancy between the number of reads assigned by isoquant and the pile-up I could see there. It turned out that this was because IGV had a filter on MAPQ and was only showing high-quality mapping, whereas IsoQuant will use (and quantify?) all the reads regardless of mapping quality.

Is this intended behavior, and if so were there any experiments to suggest that this provides better results? Or perhaps we should be filtering our BAM file before running isoquant, to remove the low-quality alignments?

@andrewprzh
Copy link
Collaborator

Hi James!

That's a good question, and I did some experiments in the past, mostly with respect to transcript discovery.

IsoQuant has no general MAPQ cutoff, but filters some alignments by quality in some specific cases.
For example, in the annotation-free mode primary alignments with 1 or 2 exonic blocks and MAPQ <= 1 are filtered out (as far as I'm aware, minimap2 sets MAPQ=0 for all secondary alignments, thus filtering is applied only to the primary ones). Novel transcripts with 1 or 2 exons are also required to be constructed from alignments with average MAPQ >= 30.
In the annotation-based mode it also filters out inconsistent alignments with MAPQ <= 5 (i.e. that do not match annotated isoforms).

Constants are a bit ad-hoc and may seem to be complicated, but I certainly did some experiments to implement this filtering. Although I cannot say I did an excessive analysis of MAPQ values in general.
Based on what I've seen on real and simulated data, I'm pretty sure that alignments containing 1 or 2 exonic blocks tend to be less reliable in general and can be filtered by MAPQ. However, multi-intronic alignments with low quality sometimes appear to correctly match to annotated isoform.

Also, some long-read RNA aligners to not report meaningful MAPQs, setting 60 for all alignments. However, I presume minimap2 is used in 99% of cases.

In order not to complicate the usage, I can simply add MAPQ filtering option to IsoQuant. It should be also fairly easy to perform some experiments on simulated data to figure out how transcript discovery is affected by the global MAPQ filtering.

Best
Andrey

@andrewprzh andrewprzh added the enhancement New feature or request label Sep 12, 2023
@jamestwebber
Copy link
Collaborator Author

I thought you were using secondary alignments based on the help text that said --no_secondary was not recommended. But these are being filtered out, or maybe that is only in the annotation-free mode?

I noticed this because I was comparing multiple runs of the same sample, using different library preps, and I saw a very large difference in read counts at a pseudogene. It seemed like they were primarily low-quality alignments, which made me suspicious that this was the right assignment.

@andrewprzh
Copy link
Collaborator

I thought you were using secondary alignments based on the help text that said --no_secondary was not recommended. But these are being filtered out, or maybe that is only in the annotation-free mode?

Yes, secondary alignments are not discarded by default. I meant that MAPQ filtering is only applied to the primary alignments.

@andrewprzh
Copy link
Collaborator

Implemented now in version 3.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants