trimgalore and cutadapt in two steps? #657

marwa38 · 2022-11-21T10:14:40Z

Hi team
I know that trimgalore is a wrapper around cutadapt but I want to check what I ran is considered fine
I ran trimgalore to remove adaptors and primers at 5' and 3' ends but then

trim_galore -q 30 --phred33 --paired --gzip --fastqc --fastqc_args "--nogroup" -o ./trimGalored --length 20 --stringency 1 -e 0 --clip_R1 17 --clip_R2 21 *_S"$i"_L001_R1_001.fastq.gz *_S"$i"_L001_R2_001.fastq.gz

I found that I need to check for primers within the sequences themselves and I found and I removed them
Things are ok that I did my analysis in that way? I don't want to reran the analysis again but I want to make sure that this is fine.

# Place filtered files in filtN/ subdirectory
fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(path, "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)

#Check if primers are still inside (N, W and H nucleotides are identified by cutadapt)
FWD <- "CCTACGGGNGGCWGCAG"  
REV <- "GACTACHVGGGTATCTAATCC"

allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  # The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients

# now ready to count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations. 
#Identifying and counting the primers on one set of paired end FASTQ files is sufficient, assuming all the files were created using the same library preparation, so weâ€™ll just process the first sample.

primerHits <- function(primer, fn) {
  # Counts number of reads in which the primer is found
  nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  return(sum(nhits > 0))
}
library (ShortRead)
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), 
      FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]), 
      REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]), 
      REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
#                   Forward Complement Reverse RevComp
# FWD.ForwardReads       0          0       0       0
# FWD.ReverseReads       0          0       0     112
# REV.ForwardReads       0          0       0       8
# REV.ReverseReads      99          0       0       0

The text was updated successfully, but these errors were encountered:

marcelm · 2022-11-21T10:46:10Z

Unfortunately, I don’t know what the trim_galore command that you listed does under the hood. Can you perhaps talk to the TrimGalore developers about this?

marwa38 · 2022-12-05T10:41:06Z

@marcelm
I asked as was kindly adviced the TrimGalore team FelixKrueger/TrimGalore#147
Please share you comments regarding cutadapt part if any? that would be helpful.

Thanks in advance
Marwa :)

marcelm · 2022-12-09T08:59:50Z

If you describe what you want to do, I can try to help. You would need to state which type of data you have, which adapters you want to remove and which other processing you think should be done.

marwa38 · 2022-12-09T09:50:57Z

@marcelm Thanks Marcel for your inputs in advance and for the discussion
For my 16S rRNA microbiota dataset
I want to achieve the following
1- remove the adaptors (Nexetra) which were successfully done using trimGalore above code I shared.
2- remove primers (the name of primers is V3-V4) at 5' and 3' ends (I used trimGalore params as shared in the above codes --clip_R1 17 --clip_R2 21 ) as you can see the number of base pairs in Forward primer is 17 and Reverse primer is 21

Forward: CCTACGGGNGGCWGCAG 
Reverse: GACTACHVGGGTATCTAATCC

3- remove the same primers (V3-V4; their specific sequences are attached above) if they are present within the sequenced data (I used the above cutadapt codes as the following step after trimGalore steps)
Please let me know if need more info. Hope that made things clearer?

marcelm · 2022-12-09T11:18:39Z

Have a look at this section in the documentation:
https://cutadapt.readthedocs.io/en/stable/recipes.html#trimming-amplicon-primers-from-paired-end-reads

If I googled correctly, the V3-V4 primers target a region that is about 460 bp long, so longer than your read length. That means that you can use the somewhat simpler first command suggested in that section. In your case:

cutadapt -j 8 -g ^CCTACGGGNGGCWGCAG -G ^GACTACHVGGGTATCTAATCC --discard-untrimmed -o out.1.fastq.gz -p out.2.fastq.gz in.1.fastq.gz in.2.fastq.gz

I am not 100% sure, but I think you don’t need to trim Nextera adapters at all. The adapters should only be part of the read when the sequenced fragment is shorter than the read length, but your fragments/amplicons should all be longer. You may have had some adapters in your data, but I would guess that these are from fragments that are not amplicons anyway. With the above command, any read pair that does not have one of the primer sequences is discarded, so the output should not contain any more adapters.

marcelm · 2023-01-09T13:39:13Z

Is it ok to close this issue?

marcelm closed this as completed Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trimgalore and cutadapt in two steps? #657

trimgalore and cutadapt in two steps? #657

marwa38 commented Nov 21, 2022

marcelm commented Nov 21, 2022

marwa38 commented Dec 5, 2022 •

edited

Loading

marcelm commented Dec 9, 2022

marwa38 commented Dec 9, 2022 •

edited

Loading

marcelm commented Dec 9, 2022

marcelm commented Jan 9, 2023

trimgalore and cutadapt in two steps? #657

trimgalore and cutadapt in two steps? #657

Comments

marwa38 commented Nov 21, 2022

marcelm commented Nov 21, 2022

marwa38 commented Dec 5, 2022 • edited Loading

marcelm commented Dec 9, 2022

marwa38 commented Dec 9, 2022 • edited Loading

marcelm commented Dec 9, 2022

marcelm commented Jan 9, 2023

marwa38 commented Dec 5, 2022 •

edited

Loading

marwa38 commented Dec 9, 2022 •

edited

Loading