Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional trimming with Trim Galore #117

Merged
merged 60 commits into from
Feb 26, 2020
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
a733603
add re to scrape Trim Galore version
chelauk Feb 14, 2020
b0e50f2
add Trim galore, process and multiqc
chelauk Feb 14, 2020
93a0697
add Trim Galore to environment.yml
chelauk Feb 14, 2020
3aabf7a
include changes
chelauk Feb 14, 2020
ca3e979
--add trimFastq option
chelauk Feb 19, 2020
34fc93e
add test_local.config for small local refs. nextflow.config altered t…
chelauk Feb 19, 2020
699474b
add options to trimFastq
chelauk Feb 19, 2020
41d08ab
Update main.nf
chelauk Feb 19, 2020
97a6946
Update main.nf
chelauk Feb 19, 2020
1b9211a
Update nextflow.config
chelauk Feb 19, 2020
ad00fd1
Update main.nf
chelauk Feb 19, 2020
38311c4
Update main.nf
chelauk Feb 19, 2020
809aef4
Update main.nf
chelauk Feb 19, 2020
a7ab22c
Update main.nf
chelauk Feb 19, 2020
6af90b6
Update nextflow.config
chelauk Feb 19, 2020
1817d08
Update nextflow.config
chelauk Feb 19, 2020
375be24
Update main.nf
chelauk Feb 19, 2020
cb66e52
Update main.nf
chelauk Feb 19, 2020
31f66dc
Update main.nf
chelauk Feb 19, 2020
4225b10
Update main.nf
chelauk Feb 19, 2020
05f08b9
Update main.nf
chelauk Feb 19, 2020
9cf3746
Update CHANGELOG.md
chelauk Feb 19, 2020
d0cbf88
Update usage.md
chelauk Feb 19, 2020
083d725
Merge branch 'dev' into dev
chelauk Feb 19, 2020
57a01e9
Update nextflow.config
chelauk Feb 20, 2020
7051461
Delete test_local.config
chelauk Feb 20, 2020
0133607
Update main.nf
chelauk Feb 20, 2020
7b0fb2c
Update usage.md
chelauk Feb 20, 2020
9abfe4b
Apply suggestions from code review
chelauk Feb 20, 2020
90f9920
Update docs/usage.md
chelauk Feb 20, 2020
66f1f21
Update usage.md
chelauk Feb 20, 2020
94eca69
Update usage.md
chelauk Feb 20, 2020
2e3ad6b
Update main.nf
chelauk Feb 24, 2020
9538b6c
Update usage.md
chelauk Feb 24, 2020
6acda52
Update usage.md
chelauk Feb 24, 2020
5a6f5ff
Merge branch 'dev' into dev
maxulysse Feb 24, 2020
7cd92b9
Update main.nf
chelauk Feb 24, 2020
5578a01
Update usage.md
chelauk Feb 24, 2020
b43e109
Update main.nf
chelauk Feb 24, 2020
a176c67
Update nextflow.config
chelauk Feb 24, 2020
e6b91a1
Update nextflow.config
chelauk Feb 24, 2020
dcc325f
Update nextflow.config
chelauk Feb 24, 2020
6a44979
Update nextflow.config
chelauk Feb 24, 2020
2d5d7f3
Update nextflow.config
chelauk Feb 24, 2020
253694b
Update nextflow.config
chelauk Feb 24, 2020
08aeb26
Update main.nf
chelauk Feb 24, 2020
595bf35
Update nextflow.config
chelauk Feb 24, 2020
a47b76b
Update main.nf
chelauk Feb 24, 2020
797558e
Update main.nf
chelauk Feb 24, 2020
8afec12
Update main.nf
chelauk Feb 24, 2020
3a96c2f
Update main.nf
chelauk Feb 24, 2020
bbc7aa5
Apply suggestions from code review
chelauk Feb 24, 2020
7a6326a
Merge branch 'dev' into dev
maxulysse Feb 25, 2020
bbd72ab
Update main.nf
chelauk Feb 25, 2020
02d3c36
Update nextflow.config
chelauk Feb 25, 2020
96e3655
Update environment.yml
chelauk Feb 25, 2020
bdbf223
Update main.nf
chelauk Feb 25, 2020
d8774d4
Update main.nf
chelauk Feb 25, 2020
80443d0
Merge branch 'dev' into dev
maxulysse Feb 25, 2020
38a78c8
Update usage.md
chelauk Feb 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
## dev

### `Added`

- [#117](https://github.com/nf-core/sarek/pull/117) - Add `Trim Galore` possibilities to Sarek
- [#76](https://github.com/nf-core/sarek/pull/76) - Add `GATK Spark` possibilities to Sarek
- [#87](https://github.com/nf-core/sarek/pull/87) - Add `GATK BaseRecalibrator` plot to `MultiQC` report
- [#115](https://github.com/nf-core/sarek/pull/115) - Add [@szilvajuhos](https://github.com/szilvajuhos) abstract for ESHG2020
Expand Down
2 changes: 2 additions & 0 deletions bin/scrape_software_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
'SnpEff': ['v_snpeff.txt', r"version SnpEff (\S+)"],
'Strelka': ['v_strelka.txt', r"([0-9.]+)"],
'TIDDIT': ['v_tiddit.txt', r"TIDDIT-(\S+)"],
'Trim Galore': ['v_trim_galore.txt', r"version (\S+)"],
'vcftools': ['v_vcftools.txt', r"([0-9.]+)"],
'VEP': ['v_vep.txt', r"ensembl-vep : (\S+)"],
}
Expand All @@ -44,6 +45,7 @@
results['SnpEff'] = '<span style="color:#999999;\">N/A</span>'
results['Strelka'] = '<span style="color:#999999;\">N/A</span>'
results['TIDDIT'] = '<span style="color:#999999;\">N/A</span>'
results['Trim Galore'] = '<span style="color:#999999;\">N/A</span>'
results['vcftools'] = '<span style="color:#999999;\">N/A</span>'
results['VEP'] = '<span style="color:#999999;\">N/A</span>'

Expand Down
53 changes: 27 additions & 26 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@
- [-profile](#-profile)
- [--input](#--input)
- [--split_fastq](#--split_fastq)
- [--trim_fastq](#--trim_fastq)
- [--clip_r1](#--clip_r1)
- [--clip_r2](#--clip_r2)
- [--three_prime_clip_r1](#--three_prime_clip_r1)
- [--three_prime_clip_r2](#--three_prime_clip_r2)
- [--trim_nextseq](#--trim_nextseq)
- [--save_trimmed](#--save_trimmed)
- [--sample](#--sample)
- [--sampleDir](#--sampledir)
- [--annotateVCF](#--annotatevcf)
Expand Down Expand Up @@ -215,47 +222,41 @@ For example:
--split_fastq 10000
```

### --sample

> :warning: This params is deprecated -- it will be removed in a future release.
> Please check: [`--input`](#--input)

### --sampleDir
### --trim_fastq
Use this to perform adapter trimming [Trim Galore](https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md)

> :warning: This params is deprecated -- it will be removed in a future release.
> Please check: [`--input`](#--input)
### --clip_r1
Instructs Trim Galore to remove <int> bp from the 5' end of read 1 (or single-end reads). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end.

### --annotateVCF
### --clip_r2
Instructs Trim Galore to remove <int> bp from the 5' end of read 2 (paired-end reads only). This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end.

> :warning: This params is deprecated -- it will be removed in a future release.
> Please check: [`--input`](#--input)
### --three_prime_clip_r1
Instructs Trim Galore to remove <int> bp from the 3' end of read 1 (or single-end reads) AFTER adapter/quality trimming has been performed. This may remove some unwanted bias from the 3' end that is not directly related to adapter sequence or basecall quality.

Multiple VCF files can be specified if the path must be enclosed in quotes
### --three_prime_clip_r2
Instructs Trim Galore to re move <int> bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed. This may remove some unwanted bias from the 3' end that is not directly related to adapter sequence or basecall quality.

### --no_gvcf
### --trim_nextseq
This enables the option --nextseq-trim=3'CUTOFF within Cutadapt, which will set a quality cutoff (that is normally given with -q instead), but qualities of G bases are ignored. This trimming is in common for the NextSeq- and NovaSeq-platforms, where basecalls without any signal are called as high-quality G bases.

Use this to disable g.vcf from `HaplotypeCaller`.
### --save_trimmed
Option to keep trimmed fastqs

### --noGVCF
### --sample

> :warning: This params is deprecated -- it will be removed in a future release.
> Please check: [`--no_gvcf`](#--no_gvcf)

### --skip_qc

Use this to disable specific QC and Reporting tools.
Available: `all`, `bamQC`, `BCFtools`, `FastQC`, `MultiQC`, `samtools`, `vcftools`, `versions`
Default: `None`
> Please check: [`--input`](#--input)

### --skipQC
### --sampleDir

> :warning: This params is deprecated -- it will be removed in a future release.
> Please check: [`--skip_qc`](#--skip_qc)
> Please check: [`--input`](#--input)

### --noReports
### --annotateVCF

> :warning: This params is deprecated -- it will be removed in a future release.
> Please check: [`--skipQC`](#--skipQC)
> Please check: [`--input`](#--input)

### --nucleotides_per_second

Expand Down
3 changes: 2 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ dependencies:
- snpeff=4.3.1t
- strelka=2.9.10
- tiddit=2.7.1
- trim-galore=0.6.5
- vcfanno=0.3.1
- vcftools=0.1.16
- vcftools=0.1.16
95 changes: 81 additions & 14 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,15 @@ def helpMessage() {
--pon panel-of-normals VCF (bgzipped, indexed). See: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_mutect_CreateSomaticPanelOfNormals.php
--pon_index index of pon panel-of-normals VCF

Trimming:
--trim_fastq [bool] Run Trim Galore
--clip_r1 [int] Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads)
--clip_r2 [int] Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only)
--three_prime_clip_r1 [int] Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed
--three_prime_clip_r2 [int] Instructs Trim Galore to remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed
--trim_nextseq [int] Instructs Trim Galore to apply the --nextseq=X option, to trim based on quality after removing poly-G tails
--save_trimmed [bool] Save trimmed FastQ file intermediates

References If not specified in the configuration file or you wish to overwrite any of the references.
--ac_loci acLoci file
--ac_loci_gc acLoci GC file
Expand Down Expand Up @@ -480,13 +489,21 @@ if (params.target_bed) summary['Target BED'] = params.target_bed
if (step) summary['Step'] = step
if (params.tools) summary['Tools'] = tools.join(', ')
if (params.skip_qc) summary['QC tools skip'] = skipQC.join(', ')

if (params.trim_fastq) {
summary['Fastq trim'] = "Fastq trim selected"
summary['Trim R1'] = "$params.clip_r1 bp"
summary['Trim R2'] = "$params.clip_r2 bp"
summary["Trim 3' R1"] = "$params.three_prime_clip_r1 bp"
summary["Trim 3' R2"] = "$params.three_prime_clip_r2 bp"
summary["NextSeq Trim"] = "$params.trim_nextseq bp"
}
if (params.no_intervals && step != 'annotate') summary['Intervals'] = 'Do not use'
if ('haplotypecaller' in tools) summary['GVCF'] = params.no_gvcf ? 'No' : 'Yes'
if ('strelka' in tools && 'manta' in tools ) summary['Strelka BP'] = params.no_strelka_bp ? 'No' : 'Yes'
if (params.sequencing_center) summary['Sequenced by'] = params.sequencing_center
if (params.pon && 'mutect2' in tools) summary['Panel of normals'] = params.pon

//summary['Saved Trimmed Fastq'] = params.saveTrimmed ? 'Yes' : 'No'
summary['Save Reference'] = params.save_reference ? 'Yes' : 'No'
summary['Nucleotides/s'] = params.nucleotides_per_second
summary['Output dir'] = params.outdir
Expand Down Expand Up @@ -572,6 +589,7 @@ process GetSoftwareVersions {
R -e "library(ASCAT); help(package='ASCAT')" &> v_ascat.txt
samtools --version &> v_samtools.txt 2>&1 || true
tiddit &> v_tiddit.txt 2>&1 || true
trim_galore -v &> v_trim_galore.txt 2>&1 || true
vcftools --version &> v_vcftools.txt 2>&1 || true
vep --help &> v_vep.txt 2>&1 || true

Expand Down Expand Up @@ -887,7 +905,7 @@ if (params.split_fastq){

inputPairReads = inputPairReads.dump(tag:'INPUT')

(inputPairReads, inputPairReadsFastQC) = inputPairReads.into(2)
(inputPairReads, inputPairReadsTrimGalore, inputPairReadsFastQC) = inputPairReads.into(3)

// STEP 0.5: QC ON READS

Expand All @@ -909,7 +927,7 @@ process FastQCFQ {
file("*.{html,zip}") into fastQCFQReport

when: !('fastqc' in skipQC)

script:
"""
fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz
Expand Down Expand Up @@ -942,11 +960,59 @@ fastQCReport = fastQCFQReport.mix(fastQCBAMReport)

fastQCReport = fastQCReport.dump(tag:'FastQC')

outputPairReadsTrimGalore = Channel.create()

if (params.trim_fastq) {
process TrimGalore {
label 'TrimGalore'

tag {idPatient + "-" + idRun}

publishDir "${params.outdir}/Reports/${idSample}/TrimGalore/${idSample}_${idRun}", mode: params.publish_dir_mode,
saveAs: {filename ->
if (filename.indexOf("_fastqc") > 0) "FastQC/$filename"
else if (filename.indexOf("trimming_report.txt") > 0) "logs/$filename"
else if (params.save_trimmed) filename
else null
}

input:
set idPatient, idSample, idRun, file("${idSample}_${idRun}_R1.fastq.gz"), file("${idSample}_${idRun}_R2.fastq.gz") from inputPairReadsTrimGalore

output:
file("*.{html,zip,txt}") into trimGaloreReport
set idPatient, idSample, idRun, file("${idSample}_${idRun}_R1_val_1.fq.gz"), file("${idSample}_${idRun}_R2_val_2.fq.gz") into outputPairReadsTrimGalore

script:
// Calculate number of --cores for TrimGalore based on value of task.cpus
// See: https://github.com/FelixKrueger/TrimGalore/blob/master/Changelog.md#version-060-release-on-1-mar-2019
// See: https://github.com/nf-core/atacseq/pull/65
def cores = 1
if (task.cpus) {
cores = (task.cpus as int) - 4
if (cores < 1) cores = 1
if (cores > 4) cores = 4
}
c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : ''
c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : ''
tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : ''
tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : ''
nextseq = params.trim_nextseq > 0 ? "--nextseq ${params.trim_nextseq}" : ''
"""
trim_galore --cores $cores --paired --fastqc --gzip $c_r1 $c_r2 $tpc_r1 $tpc_r2 $nextseq ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz
"""
}
} else {
inputPairReadsTrimGalore
.set {outputPairReadsTrimGalore}
trimGaloreReport = Channel.empty()
}

// STEP 1: MAPPING READS TO REFERENCE GENOME WITH BWA MEM

inputPairReads = inputPairReads.dump(tag:'INPUT')

inputPairReads = inputPairReads.mix(inputBam)
inputPairReads = outputPairReadsTrimGalore.mix(inputBam)
inputPairReads = inputPairReads.dump(tag:'INPUT')

(inputPairReads, inputPairReadsSentieon) = inputPairReads.into(2)
if (params.sentieon) inputPairReads.close()
Expand Down Expand Up @@ -1036,7 +1102,7 @@ process SentieonMapReads {
"""
sentieon bwa mem -K 100000000 -R \"${readGroup}\" ${extra} -t ${task.cpus} -M ${fasta} \
${inputFile1} ${inputFile2} | \
sentieon util sort -r ${fasta} -o ${idSample}_${idRun}.bam -t ${task.cpus} --sam2bam -i -
sentieon util sort -r ${fasta} -o ${idSample}_${idRun}.bam -t ${task.cpus} --sam2bam -i -
"""
}

Expand Down Expand Up @@ -1408,7 +1474,7 @@ process SentieonBQSR {
file(knownIndelsIndex) from ch_known_indels_tbi

output:
set idPatient, idSample, file("${idSample}.recal.bam"), file("${idSample}.recal.bam.bai") into bamRecalSentieon
set idPatient, idSample, file("${idSample}.recal.bam"), file("${idSample}.recal.bam.bai") into bamRecalSentieon
set idPatient, idSample into bamRecalSentieonTSV
file("${idSample}_recal_result.csv") into bamRecalSentieonQC

Expand Down Expand Up @@ -2082,7 +2148,7 @@ process MergeMutect2Stats {

when: 'mutect2' in tools

script:
script:
stats = statsFiles.collect{ "-stats ${it} " }.join(' ')
"""
gatk --java-options "-Xmx${task.memory.toGiga()}g" \
Expand Down Expand Up @@ -2119,11 +2185,11 @@ process ConcatVCF {
when: ('haplotypecaller' in tools || 'mutect2' in tools || 'freebayes' in tools)

script:
if (variantCaller == 'HaplotypeCallerGVCF')
if (variantCaller == 'HaplotypeCallerGVCF')
outputFile = "HaplotypeCaller_${idSample}.g.vcf"
else if (variantCaller == "Mutect2")
else if (variantCaller == "Mutect2")
outputFile = "Mutect2_unfiltered_${idSample}.vcf"
else
else
outputFile = "${variantCaller}_${idSample}.vcf"
options = params.target_bed ? "-t ${targetBED}" : ""
"""
Expand Down Expand Up @@ -2213,13 +2279,13 @@ process CalculateContamination {

input:
set idPatient, idSampleNormal, idSampleTumor, file(bamNormal), file(baiNormal), file(bamTumor), file(baiTumor), file(mergedPileup) from pairBamCalculateContamination

output:
set idPatient, val("${idSampleTumor}_vs_${idSampleNormal}"), file("${idSampleTumor}_contamination.table") into contaminationTable

when: 'mutect2' in tools

script:
script:
"""
# calculate contamination
gatk --java-options "-Xmx${task.memory.toGiga()}g" \
Expand Down Expand Up @@ -2251,7 +2317,7 @@ process FilterMutect2Calls {
file(germlineResource) from ch_germline_resource
file(germlineResourceIndex) from ch_germline_resource_tbi
file(intervals) from ch_intervals

output:
set val("Mutect2"), idPatient, idSamplePair, file("Mutect2_filtered_${idSamplePair}.vcf.gz"), file("Mutect2_filtered_${idSamplePair}.vcf.gz.tbi"), file("Mutect2_filtered_${idSamplePair}.vcf.gz.filteringStats.tsv") into filteredMutect2Output

Expand Down Expand Up @@ -3187,6 +3253,7 @@ process MultiQC {
file ('DuplicateMarked/*.recal.table') from baseRecalibratorReport.collect().ifEmpty([])
file ('SamToolsStats/*') from samtoolsStatsReport.collect().ifEmpty([])
file ('snpEff/*') from snpeffReport.collect().ifEmpty([])
file ('TrimGalore/*') from trimGaloreReport.collect().ifEmpty([])
file ('VCFTools/*') from vcftoolsReport.collect().ifEmpty([])

output:
Expand Down
12 changes: 11 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,16 @@ params {
save_reference = null // Built Indexes not saved
sequencing_center = null // No sequencing center to be written in BAM header in MapReads process
sentieon = null // Not using Sentieon by default

// options: Trimming
trim_fastq = false
clip_r1 = 0
clip_r2 = 0
three_prime_clip_r1 = 0
three_prime_clip_r2 = 0
trim_nextseq = 0
skip_trimming = false
save_trimmed = false

// Optional files/directory
cadd_indels = false // No CADD InDels file
Expand Down Expand Up @@ -71,7 +81,7 @@ params {
plaintext_email = false // Plaintext email disabled

// Base specifications
cpus = 8
cpus = 8
max_cpus = 16
max_memory = 128.GB
max_time = 240.h
Expand Down