Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do general filterings on MQ, BQ and coverage work together? #68

Open
Sfeng666 opened this issue Jan 15, 2024 · 1 comment
Open

How do general filterings on MQ, BQ and coverage work together? #68

Sfeng666 opened this issue Jan 15, 2024 · 1 comment

Comments

@Sfeng666
Copy link

Hi Michael,

Thanks for developing JACUSA & JACUSA2. Those are well-annotated tools with thoughtful functions for RNA editing analyses.

I have a couple of questions about: how sites were filtered based on mapping quality (MQ), base quality (BQ) and minimum coverage, assuming a call-1 scenario:

  1. When filtering by mapping quality (-m) and one read has MQ below the threshold, does JACUSA2 remove all genomic sites (i.e., 1 bp positions) that were covered by this read (even if other reads has MQ above threshold) , or it simply discard this read, and only count reads that have MQ above threshold for coverage and allele count? My guess is the latter one.
  2. Similarly, when filtering by base quality (-q) and one read has BQ below the threshold at a given site, does JACUSA2 remove this genomic site (even if other reads has BQ above threshold at the same site) , or it simply don't count this read at this site, and only count read bases that have BQ above threshold for coverage and allele count? My guess is also the latter one.
  3. Just to confirm: if both -m and -q works to discard reads that fail the filter, is the min-coverage filter (-c) based on the coverage calculated from above-threshold reads/bases?
  4. I also support @y9c on adding the option of accept sites of coverage = 0. This will be helpful for downstream analysis involving multiple samples.

I know these could be basic questions, but they were not clearly explained in the JACUSA2 manual. Since BQ is assigned to each base of each read, and MQ is assigned to each read, a given genomic position could have multiple BQ/MQs. The manual explanation filter positions with BQ/MQ < min-BQ/MQ is ambiguous about how the filtering decision is made with those BQ/MQs.

Thanks

@piechottam
Copy link
Collaborator

Thank you, for your questions and feedback!

Answers

  1. MQ is a read specific info, therefore, JACUSA2 discards the entire reads when the criteria is not met.
  2. BQ is read and position specific. JACUSA2 discards the position of a read where the criteria is not met - other positions are not affected.
  3. Yes, "-c" is "pileup-specific". All base calls (BG), from reads (MQ) are aggregated and only position that have sufficient coverage (-c) are in the output.
  4. Unfortunately, this is not possible. The test-statistics expects to have equal number of replicates for each site in a single comparison run. If you allow zero coverage sites, that specific would be counted without contributing any base calls.

Filtering is carried out on multiple levels:

  • reads, e.g.: Mapping quality, Tags, Flags
  • basecall, e.g.: Base call quality
  • Pileup, e.g.: minimal coverage (-c)
  • Parallel Pileup (comparing pileups from 2 condition), e.g. Only sites that contain differences are output (or use "-A" to output all sites)
  • "Feature", e.g.: HomozygousFilter for RNA-differences comparisons, where you want to remove polymorphic positions (-a H:condition=1 -> require condition 1 to be homozygous)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants