Forward read data look great, paired end and reverse reads look erroneous #708

robertocnava · 2024-10-20T05:34:24Z

Hi Felix,
I'm hoping you can help with an issue I'm having with paired end sequencing. I'm sequencing an amplicon that is 350bp long. The first ~300 bases on the forward read match what I expect. Biological and technical replicates match closely. When I run PE alignment, the data follows no pattern I can recognize. PE alignment is low and seemingly random. SE alignment of R2 using --pbat has an alignment rate is fine but the data doesn't make much sense to me. Would you mind taking a look? This is a repetitive region in hg38. I've also attached the unconverted amplicon.

Thank you very much for your help
BISMID-NTG-200-1_S0_L001_R1_001.fastq (2).gz
BISMID-NTG-200-1_S0_L001_R2_001.fastq (2).gz
BISMID-NTG-200-2_S1_L001_R1_001.fastq (2).gz
BISMID-NTG-200-2_S1_L001_R2_001.fastq (2).gz
BIS-MID.txt

BISMID-NTG-200-3_S2_L001_R1_001.fastq (1).gz
BISMID-NTG-200-3_S2_L001_R2_001.fastq (1).gz

FelixKrueger · 2024-10-20T20:57:27Z

Alright, I took a quick look but didn't go into a lot of detail.

I found that R1 and R2 can be aligned to the amplicon separately, with mapping efficiencies being fairly low (~15%) in the default mode. Efficiency increases to ~30, and 50% when the parameters are relaxed to --score_min L,0,-0.4 or -0.6. Using --local, it goes up to >80%. So there appear to be mismatches to the reference that cause the low mapping in default mode, and I believe this comes from the first 9-10 bp of each read:

Read1

Read2

Indeed, almost 100% of all reads start with these bases; not exactly sure how the experiment was designed, but these residues don't seem to align to your reference.

Indeed, either running trim_galore --clip_r1 10 --clip_r2 10 --paired *fastq.gz followed by a relaxed Bismark run produces a good amount of PE alingments. Also PE alignments in --local mode (>80%) produce alignments that are 10bt shorter on either end, when compared to SE alingments.

I hope this helps?

FelixKrueger · 2024-10-20T20:59:54Z

Apologies, forgot to describe the last plot:

Shown are Read 1 and Read 2 as single-end alignments, reads on top, and a wiggle-plot below, showing that abrupt start and end of the fragments. The bottom track are local alignments. Of note, start and end are 10bp shorter, as a consequence of soft-clipping the 100% biased positions at the 5' end of R1 and R2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward read data look great, paired end and reverse reads look erroneous #708

Forward read data look great, paired end and reverse reads look erroneous #708

robertocnava commented Oct 20, 2024

FelixKrueger commented Oct 20, 2024

FelixKrueger commented Oct 20, 2024

Forward read data look great, paired end and reverse reads look erroneous #708

Forward read data look great, paired end and reverse reads look erroneous #708

Comments

robertocnava commented Oct 20, 2024

FelixKrueger commented Oct 20, 2024

FelixKrueger commented Oct 20, 2024