Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward read data look great, paired end and reverse reads look erroneous #708

Open
robertocnava opened this issue Oct 20, 2024 · 2 comments

Comments

@robertocnava
Copy link

Hi Felix,
I'm hoping you can help with an issue I'm having with paired end sequencing. I'm sequencing an amplicon that is 350bp long. The first ~300 bases on the forward read match what I expect. Biological and technical replicates match closely. When I run PE alignment, the data follows no pattern I can recognize. PE alignment is low and seemingly random. SE alignment of R2 using --pbat has an alignment rate is fine but the data doesn't make much sense to me. Would you mind taking a look? This is a repetitive region in hg38. I've also attached the unconverted amplicon.

Thank you very much for your help
BISMID-NTG-200-1_S0_L001_R1_001.fastq (2).gz
BISMID-NTG-200-1_S0_L001_R2_001.fastq (2).gz
BISMID-NTG-200-2_S1_L001_R1_001.fastq (2).gz
BISMID-NTG-200-2_S1_L001_R2_001.fastq (2).gz
BIS-MID.txt

BISMID-NTG-200-3_S2_L001_R1_001.fastq (1).gz
BISMID-NTG-200-3_S2_L001_R2_001.fastq (1).gz

@FelixKrueger
Copy link
Owner

Alright, I took a quick look but didn't go into a lot of detail.

I found that R1 and R2 can be aligned to the amplicon separately, with mapping efficiencies being fairly low (~15%) in the default mode. Efficiency increases to ~30, and 50% when the parameters are relaxed to --score_min L,0,-0.4 or -0.6. Using --local, it goes up to >80%. So there appear to be mismatches to the reference that cause the low mapping in default mode, and I believe this comes from the first 9-10 bp of each read:

Read1
Screenshot 2024-10-20 at 21 44 24

Read2
Screenshot 2024-10-20 at 21 44 09

Indeed, almost 100% of all reads start with these bases; not exactly sure how the experiment was designed, but these residues don't seem to align to your reference.

Indeed, either running trim_galore --clip_r1 10 --clip_r2 10 --paired *fastq.gz followed by a relaxed Bismark run produces a good amount of PE alingments. Also PE alignments in --local mode (>80%) produce alignments that are 10bt shorter on either end, when compared to SE alingments.

I hope this helps?

Screenshot 2024-10-20 at 21 42 39

@FelixKrueger
Copy link
Owner

Apologies, forgot to describe the last plot:

Shown are Read 1 and Read 2 as single-end alignments, reads on top, and a wiggle-plot below, showing that abrupt start and end of the fragments. The bottom track are local alignments. Of note, start and end are 10bp shorter, as a consequence of soft-clipping the 100% biased positions at the 5' end of R1 and R2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants