-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sam: Consider recommending using 0 for mapping quality when record is unmapped #727
Comments
I have always thought that the text about “no assumptions can be made about [these fields] in [these circumstances]” is a mistake that will prevent us from defining meanings for those fields in those currently unspecified circumstances, as discussed in e.g. this samtools-devel thread. However over a decade later, I fear the ship may have sailed on changing this policy. |
Although "no assumptions can be made", if one were to be making assumptions, I would interpret a mapq defined over an unmapped read to be the mapping probability of the read being unmapped w.r.t the reference used. That is, reads for which the aligner had candidates and gave up (e.g. bwa when every seed has >500 occurrences in the reference) would be mapq0 (since it has no confidence that it's actually unmapped) and reads which don't match any reference (e.g. contamination, primers), would have non-zero mapq. |
@d-cameron I agree that could have made sense, but it's too late now and I don't think it's how aligners work so I expect such values wouldn't be sensible calibrated anyway. Although some proxy based on complexity and length (ie an entropy estimation) could be used as an indicator that it is genuine data not found in this reference (but perhaps is part of this genome, eg from a long insertion). @jmarshall - also agreed the language about "no meaning" isn't always helpful, but as you say the ship has sailed. However I think it is reasonable to make recommendations (not requirements) on the values when they are essentially NOPs. Eg CIGAR has no meaning for unmapped data, but sanity checkers may well gripe about something with CIGAR |
We already have a recommendation that MAPQ of 255 should not be used. Expand on this to recommend zero is used when unmapped. This is purely a recommendation, made for maximum compatibility, and not a specification requirement. Fixes samtools#727
As suggested by @jkbonfield in #715 (comment):
The text was updated successfully, but these errors were encountered: