-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative trimming step for polyA/T removal #663
Comments
To just quickly chip in here. It is correct that the option Standard removal of PolyA sequences could be accomplished using e.g. Trim Galore is does indeed not trim several things at the same time, and intentionally so. One could probably argue that you for the sequences you showed above, PolyA/PolyT sequence would remove all full homo-polymers entirely, and would truncate the PolyT sequences to a few nucleotides (certainly <15), which are either removed by a size-selection step, or will not map (uniquely). So in other words: these sequences are useless from a library point of view. In your case, mapping will have the exact same effect though: PolyX sequences will probably either not map properly in the first place, or be soft-clipped (and then not map properly). The result should be pretty much the same: These sequences will not result in useful RNA-seq counts. By all means, add another trimming option for the pipeline, but I don't think that it'll have a noteworthy advantage over ignoring PolyX sequences (and let the aligner take care of it), or even running an adapter trimming process first, and then using that output as the input for PolyA/T trimming (which should be just a matter of copy paste with DSL2). It is still good to know if you have these contaminants in your library, as they may dramatically impact the mapping efficiency. You could add these sequences to a custom
to see the extent of PolyX sequences in your library. Happy to discuss :)
|
Thank you for the detailed info @FelixKrueger ! I agree that most of these sequences won't align but I would personally prefer for these to removed at the "adapter" trimming step because it gives you an obvious assessment as to the fact that you may have such artifacts in the library. Trying to figure out why things haven't mapped can be quite painful. I didn't know you could use params {
modules {
'trimgalore' {
args = '--fastqc -a {A}10'
}
}
} So would you need an additional |
Hi Harshil, yes, |
Ah, I see! So it won't trim the conventional adapters if you use |
It wouldn't necessarily be the entire pipeline twice, but just add an extra trimming step. So something like the nf-core equivalent of: TRIM_GALORE (file_ch, params.outdir, params.trim_galore_args, params.verbose)
if (PolyX_trimming_will_make_my_day){
params.trim_galore += " -a {10} "
TRIM_GALORE (TRIM_GALORE.out.reads, params.outdir, params.trim_galore_args, params.verbose)
} I would also split out FastQC as it can then be run in parallel rather then as part of Trim Galore: FASTQC2 (TRIM_GALORE.out.reads, params.outdir, params.fastqc_args, params.verbose) |
but yea, in the end the results will be pretty much the same however you implement it :P |
Sorry, I meant running only the trimming (and other mandatory) steps first time around and then running the pipeline full blown with either |
For Regarding your earlier quote:
This is certainly true, but if you are running pipelines that will magically do lots of things at the same time you might just look at the mapping efficiency, see that it's great and move on - and potentially miss that you had 50% of sequences lost because of a technical issue that results in homo-polymers (which you should probably try and address on the wet lab side...) |
Good point! We also do have the cutadapt logs in the MultiQC report where this sort of info is easily accessible. But I guess it will come down to what is actually looked at further e.g. users may not even look at the mapping rates 🤷🏽 Another advantage of trimming beforehand is that there will most likely be a diversity of sequences with a variable count of these polyA bases - assuming that the aligner will handle these appropriately could be a little dangerous and even if it may be minimal this could actually impact the quantification. |
Just because this came up in a thread on slack: STAR can clip 3' polyA sequences, too, e.g. by adding the following STAR argument: |
Added fastp support in #970 which will allow you to use |
Hello, it looks like TrimGalore does not automatically perform polyA/T removal (see post-trimming fastqc screenshot below)
). They have an experimental option to do so (https://github.com/FelixKrueger/TrimGalore/blob/e9b8fd847f4da01fa3b886d134bc2ecd447a8068/trim_galore#L3230-L3257) but it would require running the nf-core/rnaseq pipeline twice (https://github.com/FelixKrueger/TrimGalore/blob/e9b8fd847f4da01fa3b886d134bc2ecd447a8068/trim_galore#L3248-L3254) (@drpatelh).
Would be great to have e.g. bbduk (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) as it allows simultaneous base quality, adapter and polyA/T trimming.
Thanks!
The text was updated successfully, but these errors were encountered: