should we be filtering for mapping quality? #110

jamestwebber · 2023-09-09T15:16:48Z

Hi Andrey 👋

We've been running IsoQuant a lot lately. This week I was looking at some of our data in IGV and noticed a discrepancy between the number of reads assigned by isoquant and the pile-up I could see there. It turned out that this was because IGV had a filter on MAPQ and was only showing high-quality mapping, whereas IsoQuant will use (and quantify?) all the reads regardless of mapping quality.

Is this intended behavior, and if so were there any experiments to suggest that this provides better results? Or perhaps we should be filtering our BAM file before running isoquant, to remove the low-quality alignments?

andrewprzh · 2023-09-12T16:30:07Z

Hi James!

That's a good question, and I did some experiments in the past, mostly with respect to transcript discovery.

IsoQuant has no general MAPQ cutoff, but filters some alignments by quality in some specific cases.
For example, in the annotation-free mode primary alignments with 1 or 2 exonic blocks and MAPQ <= 1 are filtered out (as far as I'm aware, minimap2 sets MAPQ=0 for all secondary alignments, thus filtering is applied only to the primary ones). Novel transcripts with 1 or 2 exons are also required to be constructed from alignments with average MAPQ >= 30.
In the annotation-based mode it also filters out inconsistent alignments with MAPQ <= 5 (i.e. that do not match annotated isoforms).

Constants are a bit ad-hoc and may seem to be complicated, but I certainly did some experiments to implement this filtering. Although I cannot say I did an excessive analysis of MAPQ values in general.
Based on what I've seen on real and simulated data, I'm pretty sure that alignments containing 1 or 2 exonic blocks tend to be less reliable in general and can be filtered by MAPQ. However, multi-intronic alignments with low quality sometimes appear to correctly match to annotated isoform.

Also, some long-read RNA aligners to not report meaningful MAPQs, setting 60 for all alignments. However, I presume minimap2 is used in 99% of cases.

In order not to complicate the usage, I can simply add MAPQ filtering option to IsoQuant. It should be also fairly easy to perform some experiments on simulated data to figure out how transcript discovery is affected by the global MAPQ filtering.

Best
Andrey

jamestwebber · 2023-09-12T18:44:47Z

I thought you were using secondary alignments based on the help text that said --no_secondary was not recommended. But these are being filtered out, or maybe that is only in the annotation-free mode?

I noticed this because I was comparing multiple runs of the same sample, using different library preps, and I saw a very large difference in read counts at a pseudogene. It seemed like they were primarily low-quality alignments, which made me suspicious that this was the right assignment.

andrewprzh · 2023-09-13T13:13:22Z

I thought you were using secondary alignments based on the help text that said --no_secondary was not recommended. But these are being filtered out, or maybe that is only in the annotation-free mode?

Yes, secondary alignments are not discarded by default. I meant that MAPQ filtering is only applied to the primary alignments.

andrewprzh · 2024-05-09T07:51:00Z

Implemented now in version 3.4.

andrewprzh added the enhancement New feature or request label Sep 12, 2023

andrewprzh closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should we be filtering for mapping quality? #110

should we be filtering for mapping quality? #110

jamestwebber commented Sep 9, 2023

andrewprzh commented Sep 12, 2023

jamestwebber commented Sep 12, 2023

andrewprzh commented Sep 13, 2023

andrewprzh commented May 9, 2024

should we be filtering for mapping quality? #110

should we be filtering for mapping quality? #110

Comments

jamestwebber commented Sep 9, 2023

andrewprzh commented Sep 12, 2023

jamestwebber commented Sep 12, 2023

andrewprzh commented Sep 13, 2023

andrewprzh commented May 9, 2024