Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MarkDuplicates misses duplicates of read pairs with mapping quality 0 on one end. #1285

Open
yfarjoun opened this issue Feb 21, 2019 · 5 comments

Comments

@yfarjoun
Copy link
Contributor

When a chimeric read pair has one end which aligns well and the other aligns with MappingQuality ==0 the algorithm breaks down and fails to mark duplicates of this template. This is because the aligner will randomly place read2 (the one with MQ0) and since both reads are marked as Mapped, MD considers the position of both reads.

I propose changing this behavior so that a read which is mapped with MQ0 is considered unmapped for the purposes of marking duplicates. This will resolve the problem.

Do folks see an issue with this?
Do some aligners use MQ0 to mean something other than "read has more than one equivalent alignment"?
Is there value in keeping these (likely) duplicates-reads marked as non-duplicates?

@tfenne @nh13 @lbergelson @jamesemery
I would like to hear your opinions on this.

@tfenne
Copy link
Collaborator

tfenne commented Feb 21, 2019

@yfarjoun Where do you draw the line on this? What if the mate mapping quality is low enough that one sequencing error could throw it to another position in the genome?

I would edit your title to say "duplicates of ... read pairs mapping quality 0 on one end". This can just as easily happy when e.g. one end of a pair is in unique sequence and the other end is in a small tandem duplication or locally-repetitive region within the expected insert size.

Are you unconcerned about pairs where both ends are mapq=0 because the use case you're looking at will ignore those reads? Because the same problem arises there?

In general I see the problem you're raising, and don't disagree that it should be addressed, but I think the scope is broader than you're suggesting.

@yfarjoun yfarjoun changed the title MarkDuplicates misses duplicates of Chimeric Reads MarkDuplicates misses duplicates of Chimeric Reads read pairs mapping quality 0 on one end. Feb 21, 2019
@yfarjoun
Copy link
Contributor Author

I agree that the problem is perhaps greater than I am tackling in this issue...(changed the name to MQ0 , thanks)

However...I think that the incremental model is appropriate. We deal with the problems that actually affect us and our work. as we are sequencing more and more FFPE samples and they have really bad chimeras compounded with PCR duplication, the mapped reads are not being marked as duplicates due to their MQ0 mates. I have not seen this issue with MQ=3 or other low value MQ, and so am suggesting to use MQ0 as a way to eliminate the large part of the problem.

As you say, I am unconcerned by both mates having MQ0 since they will be filtered out and will not cause downstream problems. Just as we do not worry about marking completely unmapped pairs (where both reads are unmapped)...I'm sure that there are many duplicates of that form.

I think that it's relatively straightforward to resolve the single-ended MQ0 problem, while it will be more difficult to resolve the low MQ and the both-sides-MQ. So I propose to fix the lower-effort, higher-value part of the larger problem.

@yfarjoun yfarjoun changed the title MarkDuplicates misses duplicates of Chimeric Reads read pairs mapping quality 0 on one end. MarkDuplicates misses duplicates of read pairs with mapping quality 0 on one end. Feb 22, 2019
@yfarjoun
Copy link
Contributor Author

I just realized that this commit e46438d by @nh13 (on a branch) is the beginning of an implementation for this.

Since I'd like to fix this issue, I'd like to know if there's a reason you didn't submit a PR for that branch, @nh13

@nh13
Copy link
Collaborator

nh13 commented Mar 25, 2019

It was four years ago. I don’t remember what I had for lunch last week. Sorry.

@yfarjoun
Copy link
Contributor Author

yfarjoun commented Jun 7, 2019

relevant issue from 4 years ago: #128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants