Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trimgalore and cutadapt in two steps? #657

Closed
marwa38 opened this issue Nov 21, 2022 · 6 comments
Closed

trimgalore and cutadapt in two steps? #657

marwa38 opened this issue Nov 21, 2022 · 6 comments

Comments

@marwa38
Copy link

marwa38 commented Nov 21, 2022

Hi team
I know that trimgalore is a wrapper around cutadapt but I want to check what I ran is considered fine
I ran trimgalore to remove adaptors and primers at 5' and 3' ends but then

trim_galore -q 30 --phred33 --paired --gzip --fastqc --fastqc_args "--nogroup" -o ./trimGalored --length 20 --stringency 1 -e 0 --clip_R1 17 --clip_R2 21 *_S"$i"_L001_R1_001.fastq.gz *_S"$i"_L001_R2_001.fastq.gz

I found that I need to check for primers within the sequences themselves and I found and I removed them
Things are ok that I did my analysis in that way? I don't want to reran the analysis again but I want to make sure that this is fine.

# Place filtered files in filtN/ subdirectory
fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(path, "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)

#Check if primers are still inside (N, W and H nucleotides are identified by cutadapt)
FWD <- "CCTACGGGNGGCWGCAG"  
REV <- "GACTACHVGGGTATCTAATCC"

allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  # The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients

# now ready to count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations. 
#Identifying and counting the primers on one set of paired end FASTQ files is sufficient, assuming all the files were created using the same library preparation, so we’ll just process the first sample.

primerHits <- function(primer, fn) {
  # Counts number of reads in which the primer is found
  nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  return(sum(nhits > 0))
}
library (ShortRead)
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), 
      FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]), 
      REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]), 
      REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
#                   Forward Complement Reverse RevComp
# FWD.ForwardReads       0          0       0       0
# FWD.ReverseReads       0          0       0     112
# REV.ForwardReads       0          0       0       8
# REV.ReverseReads      99          0       0       0

@marcelm
Copy link
Owner

marcelm commented Nov 21, 2022

Unfortunately, I don’t know what the trim_galore command that you listed does under the hood. Can you perhaps talk to the TrimGalore developers about this?

@marwa38
Copy link
Author

marwa38 commented Dec 5, 2022

@marcelm
I asked as was kindly adviced the TrimGalore team FelixKrueger/TrimGalore#147
Please share you comments regarding cutadapt part if any? that would be helpful.

Thanks in advance
Marwa :)

@marcelm
Copy link
Owner

marcelm commented Dec 9, 2022

If you describe what you want to do, I can try to help. You would need to state which type of data you have, which adapters you want to remove and which other processing you think should be done.

@marwa38
Copy link
Author

marwa38 commented Dec 9, 2022

@marcelm Thanks Marcel for your inputs in advance and for the discussion
For my 16S rRNA microbiota dataset
I want to achieve the following
1- remove the adaptors (Nexetra) which were successfully done using trimGalore above code I shared.
2- remove primers (the name of primers is V3-V4) at 5' and 3' ends (I used trimGalore params as shared in the above codes --clip_R1 17 --clip_R2 21 ) as you can see the number of base pairs in Forward primer is 17 and Reverse primer is 21

Forward: CCTACGGGNGGCWGCAG 
Reverse: GACTACHVGGGTATCTAATCC

3- remove the same primers (V3-V4; their specific sequences are attached above) if they are present within the sequenced data (I used the above cutadapt codes as the following step after trimGalore steps)
Please let me know if need more info. Hope that made things clearer?

@marcelm
Copy link
Owner

marcelm commented Dec 9, 2022

Have a look at this section in the documentation:
https://cutadapt.readthedocs.io/en/stable/recipes.html#trimming-amplicon-primers-from-paired-end-reads

If I googled correctly, the V3-V4 primers target a region that is about 460 bp long, so longer than your read length. That means that you can use the somewhat simpler first command suggested in that section. In your case:

cutadapt -j 8 -g ^CCTACGGGNGGCWGCAG -G ^GACTACHVGGGTATCTAATCC --discard-untrimmed -o out.1.fastq.gz -p out.2.fastq.gz in.1.fastq.gz in.2.fastq.gz

I am not 100% sure, but I think you don’t need to trim Nextera adapters at all. The adapters should only be part of the read when the sequenced fragment is shorter than the read length, but your fragments/amplicons should all be longer. You may have had some adapters in your data, but I would guess that these are from fragments that are not amplicons anyway. With the above command, any read pair that does not have one of the primer sequences is discarded, so the output should not contain any more adapters.

@marcelm
Copy link
Owner

marcelm commented Jan 9, 2023

Is it ok to close this issue?

@marcelm marcelm closed this as completed Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants