-
-
Notifications
You must be signed in to change notification settings - Fork 69
Developer Note: Tracking Reads through Grouping and Duplex Consensus Calling
The following is meant as a developer note on some conventions that relate raw reads to molecular identifiers, and single strand consensus reads to duplex consensus reads.
Please see the CallDuplexConsensusReads
tool for additional information.
GroupReadsByUmi
will assign the same molecular ID to raw reads from the same source molecule, with trailing /A
and /B
based on which "strand" they belong (top or bottom, AB or BA). By convention, the /A
raw reads will be those where the 5' unclipped position of read one (of the pair) is less than or equal to the 5' unclipped position of read two (of the pair). The 5' unclipped position is relative to sequencing order, not the strand of the reference genome.
For example given the following read pairs:
x: R1-----------------> <-------------------R2
y: R2-----------------> <-------------------R1
z: R1----------------->
<-----------------R2
x
would be given /A
, y
would be given /B
, and z
would be given /A
(even though R1 and R2 are fully overlapped, R1's 5' end is earlier).
CallDuplexConsensusReads
will write single-strand information into SAM attributes for each duplex consensus read (see Consensus Tags). The choice of which single-strand consensus information is stored in the "AB" and "BA" tags is determined as follows:
- If both strands generated a single-strand consensus, then the information for the raw reads with the trailing
/A
in their molecular identifier will be in the "AB" tags, while the information for the raw reads with the trailing/B
in their molecular identifier will be in the "BA" tags. - If only one of the two strands create a consensus (for example, because no raw reads were present for the other strand), then the "AB" tags will contain the information for the single-strand consensus that was present, while the "BA" tags will contain only "per-read" tags.
This also means that sequence of the duplex consensus will have the same "strand" as the the "AB" single-strand consensus.
Contains SAM tags for single-strand and duplex consensus reads, when available.
Value | AB | BA | Final |
---|---|---|---|
per-read-depth | aD | bD | cD |
per-read-min-depth | aM | bM | cM |
per-read-error-rate | aE | bE | cE |
per-base-depth | ad | bd | cd |
per-base-error-count | ae | be | ce |
per-base-bases | ac | bc | bases |
per-base-quals | aq | bq | quals |
The second letter in the tag is lower case if it is per-base, upper case if it is per-read.
Please see the CallDuplexConsensusReads
tool and source Consensus Tags code for more information.