Skip to content

Commit

Permalink
fix: raise an error if too few reads are left after trimming with cut…
Browse files Browse the repository at this point in the history
…adapt (#184)

* fix: throw error if trimmed fastq files are (nearly) empty

* feat: save cutadapt log output to check number of reads that passed

also refactor trim_se to avoid code duplication for --small-rna

* chore: update CHANGELOG.md

* fix: typo in awk & print npassed
  • Loading branch information
kelly-sovacool authored Dec 2, 2024
1 parent 6af726a commit b1bcdd6
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 67 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## RENEE development version

- Fix spelling of shared SIF directory on biowulf -- it is `/data/CCBR_Pipeliner/SIFs` with a lowercase "s" at the end. (#182, @kelly-sovacool)
- Raise an error if too few reads are left after trimming with cutadapt. (#184, @kelly-sovacool)

## RENEE 2.6.3

Expand Down
3 changes: 2 additions & 1 deletion config/templates/tools.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@
"Trimsettings": "---------------------------- TRIMMOMATIC parameters ----------------------------------------------------------",
"LEADINGQUALITY": 10,
"TRAILINGQUALITY": 10,
"MINLEN": 35
"MINLEN": 35,
"CUTADAPT_MIN_READS": 100
},
"ADAPTER1": "-",
"ADAPTER2": "",
Expand Down
Binary file added tests/data/fastq/WT_S1.R1.trunc.fastq.gz
Binary file not shown.
Binary file added tests/data/fastq/WT_S1.R2.trunc.fastq.gz
Binary file not shown.
14 changes: 13 additions & 1 deletion workflow/rules/paired-end.smk
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,17 @@ rule trim_pe:
#out2=temp(join(workpath,trim_dir,"{name}.R2.trim.fastq.gz"))
out1=join(workpath,trim_dir,"{name}.R1.trim.fastq.gz"),
out2=join(workpath,trim_dir,"{name}.R2.trim.fastq.gz")
log:
stdout=join(workpath,'logfiles','trim','{name}.cutadapt.out'),
stderr=join(workpath,'logfiles','trim','{name}.cutadapt.err')
params:
rname='pl:trim_pe',
# Exposed Parameters: modify config/templates/tools.json to change defaults
fastawithadaptersetd=config['bin'][pfamily]['tool_parameters']['FASTAWITHADAPTERSETD'],
leadingquality=config['bin'][pfamily]['tool_parameters']['LEADINGQUALITY'],
trailingquality=config['bin'][pfamily]['tool_parameters']['TRAILINGQUALITY'],
minlen=config['bin'][pfamily]['tool_parameters']['MINLEN'],
min_reads=config['bin'][pfamily]['tool_parameters']['CUTADAPT_MIN_READS'],
threads: int(allocated("threads", "trim_pe", cluster)),
envmodules: config['bin'][pfamily]['tool_versions']['CUTADAPTVER']
container: config['images']['cutadapt']
Expand All @@ -95,7 +99,15 @@ rule trim_pe:
-n 5 -O 5 -q {params.leadingquality},{params.trailingquality} \
-m {params.minlen}:{params.minlen} \
-b file:{params.fastawithadaptersetd} -B file:{params.fastawithadaptersetd} \
-j {threads} -o {output.out1} -p {output.out2} {input.file1} {input.file2}
-j {threads} -o {output.out1} -p {output.out2} \
{input.file1} {input.file2} \
> {log.stdout} 2> {log.stderr}
npassed=$(grep "passing filters" {log.stdout} | awk '{{print $5}}' | sed -s 's/,//g')
echo "number of reads passing filters: $npassed"
if [ $npassed -lt {params.min_reads} ]; then
echo "ERROR: too few reads are left after trimming."
exit 1
fi
"""


Expand Down
112 changes: 47 additions & 65 deletions workflow/rules/single-end.smk
Original file line number Diff line number Diff line change
Expand Up @@ -56,71 +56,53 @@ rule rawfastqc:
fastqc {input.R1} -t {threads} -o {params.outdir};
"""

if config['options']['small_rna']:
# Run STAR with ENCODE's recommendations for small RNA sequencing.
# Set the min read length to
rule trim_se:
"""
Data-processing step to remove adapter sequences and perform quality trimming
prior to alignment the reference genome. Adapters are composed of synthetic
sequences and should be removed prior to alignment.
@Input:
Raw FastQ file (scatter)
@Output:
Trimmed FastQ file
"""
input:
infq=join(workpath,"{name}.R1.fastq.gz"),
output:
outfq=temp(join(workpath,trim_dir,"{name}.R1.trim.fastq.gz"))
params:
rname='pl:trim_se',
# Exposed Parameters: modify config/templates/tools.json to change defaults
fastawithadaptersetd=config['bin'][pfamily]['tool_parameters']['FASTAWITHADAPTERSETD'],
leadingquality=config['bin'][pfamily]['tool_parameters']['LEADINGQUALITY'],
trailingquality=config['bin'][pfamily]['tool_parameters']['TRAILINGQUALITY'],
minlen=config['bin'][pfamily]['tool_parameters']['MINLEN'],
threads: int(allocated("threads", "trim_se", cluster)),
envmodules: config['bin'][pfamily]['tool_versions']['CUTADAPTVER']
container: config['images']['cutadapt']
shell: """
cutadapt --nextseq-trim=2 --trim-n \
-n 5 -O 5 -q {params.leadingquality},{params.trailingquality} \
-m 16 -b file:{params.fastawithadaptersetd} -j {threads} \
-o {output.outfq} {input.infq}
"""
else:
# Use default trimming rule for long RNAs
rule trim_se:
"""
Data-processing step to remove adapter sequences and perform quality trimming
prior to alignment the reference genome. Adapters are composed of synthetic
sequences and should be removed prior to alignment.
@Input:
Raw FastQ file (scatter)
@Output:
Trimmed FastQ file
"""
input:
infq=join(workpath,"{name}.R1.fastq.gz"),
output:
outfq=temp(join(workpath,trim_dir,"{name}.R1.trim.fastq.gz"))
params:
rname='pl:trim_se',
# Exposed Parameters: modify config/templates/tools.json to change defaults
fastawithadaptersetd=config['bin'][pfamily]['tool_parameters']['FASTAWITHADAPTERSETD'],
leadingquality=config['bin'][pfamily]['tool_parameters']['LEADINGQUALITY'],
trailingquality=config['bin'][pfamily]['tool_parameters']['TRAILINGQUALITY'],
minlen=config['bin'][pfamily]['tool_parameters']['MINLEN'],
threads: int(allocated("threads", "trim_se", cluster)),
envmodules: config['bin'][pfamily]['tool_versions']['CUTADAPTVER']
container: config['images']['cutadapt']
shell: """
cutadapt --nextseq-trim=2 --trim-n \
-n 5 -O 5 -q {params.leadingquality},{params.trailingquality} \
-m {params.minlen} -b file:{params.fastawithadaptersetd} -j {threads} \
-o {output.outfq} {input.infq}
"""
rule trim_se:
"""
Data-processing step to remove adapter sequences and perform quality trimming
prior to alignment the reference genome. Adapters are composed of synthetic
sequences and should be removed prior to alignment.
The minimum length is set in `config['bin'][pfamily]['tool_parameters']['MINLEN']`.
However, this is overridden if `config['options']['small_rna']` is `true`
(which occurs when the `--small-rna` flag is used with `renee run`),
then the minimum length is set to 16.
@Input:
Raw FastQ file (scatter)
@Output:
Trimmed FastQ file
"""
input:
infq=join(workpath,"{name}.R1.fastq.gz"),
output:
outfq=temp(join(workpath,trim_dir,"{name}.R1.trim.fastq.gz"))
log:
stdout=join(workpath,'logfiles','trim','{name}.cutadapt.out'),
stderr=join(workpath,'logfiles','trim','{name}.cutadapt.err')
params:
rname='pl:trim_se',
# Exposed Parameters: modify config/templates/tools.json to change defaults
fastawithadaptersetd=config['bin'][pfamily]['tool_parameters']['FASTAWITHADAPTERSETD'],
leadingquality=config['bin'][pfamily]['tool_parameters']['LEADINGQUALITY'],
trailingquality=config['bin'][pfamily]['tool_parameters']['TRAILINGQUALITY'],
minlen=16 if config['options']['small_rna'] else config['bin'][pfamily]['tool_parameters']['MINLEN'],
min_reads=config['bin'][pfamily]['tool_parameters']['CUTADAPT_MIN_READS'],
threads: int(allocated("threads", "trim_se", cluster)),
envmodules: config['bin'][pfamily]['tool_versions']['CUTADAPTVER']
container: config['images']['cutadapt']
shell: """
cutadapt --nextseq-trim=2 --trim-n \
-n 5 -O 5 -q {params.leadingquality},{params.trailingquality} \
-m {params.minlen} -b file:{params.fastawithadaptersetd} -j {threads} \
-o {output.outfq} {input.infq} \
> {log.stdout} 2> {log.stderr}
npassed=$(grep "passing filters" {log.stdout} | awk '{{print $5}}' | sed -s 's/,//g')
echo "number of reads passing filters: $npassed"
if [ $npassed -lt {params.min_reads} ]; then
echo "ERROR: too few reads are left after trimming."
exit 1
fi
"""


rule fastqc:
Expand Down

0 comments on commit b1bcdd6

Please sign in to comment.