Inconsistent result of no reads from the same coordinate in UMI-tools dedup #567

camelest · 2022-11-28T15:20:52Z

Hi, thank you so much for the wonderful tool.

I have encountered a strange result where in the original file I have these 2 reads (subsampled 10X chromium 5' scRNA-seq public data mapped by STAR v2.7.10a)

SRR12018267.78007711	83	chr9	5437908	255	61M40S	=	5436598	-1371	CAGTAGATGACGCACCTCAGCCAATTCGCGCAGCCCTCAGCTTCTTTAAAGAGCCGGCACTCCCCATATAAGAAATNACCGCCGGTGGCCTACTCGTAGAG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFF:F:#FFFFFFFFFFFFFFFF:FFFFFFF	NH:i:1	HI:i:1	nM:i:0	AS:i:161	CR:Z:CTCTACGAGTAGGCCA	UR:Z:CCGGCGGTNA	GX:Z:-	GN:Z:-	sS:Z:CTCTACGAGTAGGCCACCGGCGGTNATTTCTTATATGGGGAGTGCCGGCTCTTTAAAGAAGCTGAGGGCTGCGCGAATTGGCTGAGGTGCGTCATCTACTG	sQ:Z:FFFFFFF:FFFFFFFFFFFFFFFF#:F:FFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	sM:i:-23	CB:Z:-	UB:Z:-
SRR12018267.400620518	83	chr9	5437908	255	61M40S	=	5431901	-6068	CAGTAGATGACGCACCTCAGCCAATTCGCGCAGCCCTCAGCTTCTTTAAAGAGCCGGCACTCCCCAGCTCAGAAATGACCGCCGGTGGCCTACTCGTAGAG	FFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:F:F:F:FFFFFFFFFFFFFFF,:,FFFF,FFFFFFF:FFFFFFFFFFFFFFFFF::	NH:i:1	HI:i:1	nM:i:0	AS:i:161	CR:Z:CTCTACGAGTAGGCCA	UR:Z:CCGGCGGTCA	GX:Z:-	GN:Z:-	sS:Z:CTCTACGAGTAGGCCACCGGCGGTCATTTCTGAGCTGGGGAGTGCCGGCTCTTTAAAGAAGCTGAGGGCTGCGCGAATTGGCTGAGGTGCGTCATCTACTG	sQ:Z:::FFFFFFFFFFFFFFFFF:FFFFFFF,FFFF,:,FFFFFFFFFFFFFFF:F:F:F:FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFF	sM:i:0	CB:Z:CTCTACGAGTAGGCCA	UB:Z:CCGGCGGTCA

which somehow disappears after deduplication. On different umitools dedup runs, I sometimes see 5 examples of similar read groups and sometimes only see 2 examples.

What is more strange is that if I create a test.bam just containing these 2 reads, the deduplication always results in choosing 1 representative reads.

Do you have any idea what is going on here? I have read through #458 but still could not figure out why. Is it because I subsampled the data? Thank you so much for your help.

umitools v1.1.2
umi_tools dedup --per-cell -I input.bam --extract-umi-method=tag --umi-tag=UR --cell-tag=CR -S output.bam

The text was updated successfully, but these errors were encountered:

TomSmithCGAT · 2023-01-03T09:32:40Z

Hi @camelest,

umi_tools is not deterministic by default, so different runs can yield different results. There's an open PR to make it deterministic, with links to other issues describing how to make it deterministic in the current version, if you want to read further (#550).

Without seeing the full input and output for all reads with the same alignment coordinates as the reads above, it's not possible to be certain what's happening. However, I expect you have more reads with the same aligment coordinates and similar enough UMIs that form a network with more than one possible solution.

Tom

camelest changed the title ~~Inconsistent result on different runs of umitools dedup ended up to no representative reads from the same mapping coordinate~~ Inconsistent result of no reads from the same coordinate in UMI-tools dedup Nov 28, 2022

TomSmithCGAT closed this as completed Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent result of no reads from the same coordinate in UMI-tools dedup #567

Inconsistent result of no reads from the same coordinate in UMI-tools dedup #567

camelest commented Nov 28, 2022 •

edited by TomSmithCGAT

Loading

TomSmithCGAT commented Jan 3, 2023

Inconsistent result of no reads from the same coordinate in UMI-tools dedup #567

Inconsistent result of no reads from the same coordinate in UMI-tools dedup #567

Comments

camelest commented Nov 28, 2022 • edited by TomSmithCGAT Loading

TomSmithCGAT commented Jan 3, 2023

camelest commented Nov 28, 2022 •

edited by TomSmithCGAT

Loading