Read length determination #12

Nicolai-vKuegelgen · 2020-09-28T13:13:21Z

While looking through the output & code of riboseqc, I've noticed that the determination of read_lengths is based only on the mapped part of each read, while soft-clipped parts of the read are ignored.

This means that sequence/read with 31nt but only 29nt mapped & 2nt softclipped will (fasley?) be regarded as a 29nt read.

Additionally, this probably also affects the P-site assignment of such reads, if the softclipped part is at the start of the read.

I'm not sure about the reasoning behind this approach, since the reads produced by Riboseq should all come from ribosome protected fragments and the full length should be counted indepent of whether it was fully/efficiently mapped.

Xue-yd · 2023-12-19T01:46:07Z

Hi, I've also encountered this problem recently when using RiboseQC. How did you finally solve it?
Thanks a lot if you could give me some advice.

While looking through the output & code of riboseqc, I've noticed that the determination of read_lengths is based only on the mapped part of each read, while soft-clipped parts of the read are ignored.

This means that sequence/read with 31nt but only 29nt mapped & 2nt softclipped will (fasley?) be regarded as a 29nt read.

Additionally, this probably also affects the P-site assignment of such reads, if the softclipped part is at the start of the read.

I'm not sure about the reasoning behind this approach, since the reads produced by Riboseq should all come from ribosome protected fragments and the full length should be counted indepent of whether it was fully/efficiently mapped.

Nicolai-vKuegelgen · 2023-12-19T10:16:26Z

Hey,
I think I've never found a solution for that issue within the scope RiboseQC. If you are looking for other solutions I could try to check my old scripts from back then.

Xue-yd · 2023-12-19T13:05:47Z

Thanks for your timely reply！
I think I know what the problem is. The data I work with is from SRA database. I found the process pipeline of the lab where the data comes from. The softclipped sequence might originate from UMIs that are not removed efficiently. The generation of this UMIs may be due to different protocols. After I processed the data according to their pipeline, the problem was solved.
Best regards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read length determination #12

Read length determination #12

Nicolai-vKuegelgen commented Sep 28, 2020

Xue-yd commented Dec 19, 2023

Nicolai-vKuegelgen commented Dec 19, 2023

Xue-yd commented Dec 19, 2023

Read length determination #12

Read length determination #12

Comments

Nicolai-vKuegelgen commented Sep 28, 2020

Xue-yd commented Dec 19, 2023

Nicolai-vKuegelgen commented Dec 19, 2023

Xue-yd commented Dec 19, 2023