-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MarkDuplicates misses duplicates of read pairs with mapping quality 0 on one end. #1285
Comments
@yfarjoun Where do you draw the line on this? What if the mate mapping quality is low enough that one sequencing error could throw it to another position in the genome? I would edit your title to say "duplicates of ... read pairs mapping quality 0 on one end". This can just as easily happy when e.g. one end of a pair is in unique sequence and the other end is in a small tandem duplication or locally-repetitive region within the expected insert size. Are you unconcerned about pairs where both ends are mapq=0 because the use case you're looking at will ignore those reads? Because the same problem arises there? In general I see the problem you're raising, and don't disagree that it should be addressed, but I think the scope is broader than you're suggesting. |
I agree that the problem is perhaps greater than I am tackling in this issue...(changed the name to MQ0 , thanks) However...I think that the incremental model is appropriate. We deal with the problems that actually affect us and our work. as we are sequencing more and more FFPE samples and they have really bad chimeras compounded with PCR duplication, the mapped reads are not being marked as duplicates due to their MQ0 mates. I have not seen this issue with MQ=3 or other low value MQ, and so am suggesting to use MQ0 as a way to eliminate the large part of the problem. As you say, I am unconcerned by both mates having MQ0 since they will be filtered out and will not cause downstream problems. Just as we do not worry about marking completely unmapped pairs (where both reads are unmapped)...I'm sure that there are many duplicates of that form. I think that it's relatively straightforward to resolve the single-ended MQ0 problem, while it will be more difficult to resolve the low MQ and the both-sides-MQ. So I propose to fix the lower-effort, higher-value part of the larger problem. |
It was four years ago. I don’t remember what I had for lunch last week. Sorry. |
relevant issue from 4 years ago: #128 |
When a chimeric read pair has one end which aligns well and the other aligns with MappingQuality ==0 the algorithm breaks down and fails to mark duplicates of this template. This is because the aligner will randomly place read2 (the one with MQ0) and since both reads are marked as Mapped, MD considers the position of both reads.
I propose changing this behavior so that a read which is mapped with MQ0 is considered unmapped for the purposes of marking duplicates. This will resolve the problem.
Do folks see an issue with this?
Do some aligners use MQ0 to mean something other than "read has more than one equivalent alignment"?
Is there value in keeping these (likely) duplicates-reads marked as non-duplicates?
@tfenne @nh13 @lbergelson @jamesemery
I would like to hear your opinions on this.
The text was updated successfully, but these errors were encountered: