Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read length determination #12

Open
Nicolai-vKuegelgen opened this issue Sep 28, 2020 · 3 comments
Open

Read length determination #12

Nicolai-vKuegelgen opened this issue Sep 28, 2020 · 3 comments

Comments

@Nicolai-vKuegelgen
Copy link

While looking through the output & code of riboseqc, I've noticed that the determination of read_lengths is based only on the mapped part of each read, while soft-clipped parts of the read are ignored.

This means that sequence/read with 31nt but only 29nt mapped & 2nt softclipped will (fasley?) be regarded as a 29nt read.

Additionally, this probably also affects the P-site assignment of such reads, if the softclipped part is at the start of the read.

I'm not sure about the reasoning behind this approach, since the reads produced by Riboseq should all come from ribosome protected fragments and the full length should be counted indepent of whether it was fully/efficiently mapped.

@Xue-yd
Copy link

Xue-yd commented Dec 19, 2023

Hi, I've also encountered this problem recently when using RiboseQC. How did you finally solve it?
Thanks a lot if you could give me some advice.

While looking through the output & code of riboseqc, I've noticed that the determination of read_lengths is based only on the mapped part of each read, while soft-clipped parts of the read are ignored.

This means that sequence/read with 31nt but only 29nt mapped & 2nt softclipped will (fasley?) be regarded as a 29nt read.

Additionally, this probably also affects the P-site assignment of such reads, if the softclipped part is at the start of the read.

I'm not sure about the reasoning behind this approach, since the reads produced by Riboseq should all come from ribosome protected fragments and the full length should be counted indepent of whether it was fully/efficiently mapped.

@Nicolai-vKuegelgen
Copy link
Author

Hey,
I think I've never found a solution for that issue within the scope RiboseQC. If you are looking for other solutions I could try to check my old scripts from back then.

@Xue-yd
Copy link

Xue-yd commented Dec 19, 2023

Thanks for your timely reply!
I think I know what the problem is. The data I work with is from SRA database. I found the process pipeline of the lab where the data comes from. The softclipped sequence might originate from UMIs that are not removed efficiently. The generation of this UMIs may be due to different protocols. After I processed the data according to their pipeline, the problem was solved.
Best regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants