Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] linked adapters from Fasta file and discarding untrimmed pairs #806

Open
marchoeppner opened this issue Sep 6, 2024 · 1 comment

Comments

@marchoeppner
Copy link

marchoeppner commented Sep 6, 2024

Hi,

sorry, this is more of a question (or a feature request, we'll see).

We are processing amplicon data, from which we want to trim PCR primers. The amplicon is shorter than the invidual paired-end reads, which creates some challenges - meaning R1 and R2 both contain the fwd AND the rev primer sequence, requiring a specific trimming approach.

The primers may or may not be degenerate and we are generating a disambuated fasta file prior to trimming, as well as a reverse complemented one (the --revcomp function didn't seem to do what we expected it to do in our case).

Some dummy code of what this looks like at the moment:

options_5p = "-g file:${primers} -G file:${primers}"
options_3p = "-a file\$:${primers_rc} -A file\$:${primers_rc}"

 cutadapt --cores $task.cpus \\
            --discard-untrimmed \\
            --revcomp \\
            $args \\
            $reads \\
            $trimmed \\
            $options_5p \\
            $options_3p \\
            --times=2 \\
            -Z \\
            --json=$report \\
            > ${prefix}.cutadapt.log

This currently as an issue in that it does not guarantee reads to be discarded that are not trimmed on both ends in both R1 and R2. The alternative would be to pipe the whole process, first in forward direction, and then reverse - each discarding any untrimmed reads. But if we do that, we do not get the nice JSON report.

I am guessing what we would want is something that behaves like a linked adapter. But that does not seem to apply in our case since we have any number of disambiguated primer sequences. Am I missing anything here?

@marcelm
Copy link
Owner

marcelm commented Sep 6, 2024

Can you clarify what you mean by "disambiguated FASTA file"? In which way are the primers degenerate? Note that adapter/primer sequences can contain IUPAC wildcards, maybe that helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants