Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with optional and mandatory GATK resource files #592

Merged
merged 22 commits into from
Jun 20, 2022
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f217e6a
Add error message if dbsnp or known_indels is not supplied for bqsr o…
FriederikeHanssen Jun 16, 2022
2d811ff
multi-line not supported, use \n instead
FriederikeHanssen Jun 16, 2022
dd54c4c
Make PON optional
FriederikeHanssen Jun 16, 2022
a49df17
Make germline_resource optional for mutect2
FriederikeHanssen Jun 16, 2022
30b368c
Update getpileup, deal with optional dbsnp, germline, knwon_indels
FriederikeHanssen Jun 16, 2022
f3796c1
Formatting
FriederikeHanssen Jun 16, 2022
63e8e7d
Formatting
FriederikeHanssen Jun 16, 2022
f0805af
Merge remote-tracking branch 'upstream/dev' into gatk_resource
FriederikeHanssen Jun 17, 2022
37043a0
Add suggestion on value channels to handle optional input
FriederikeHanssen Jun 17, 2022
3d34ae7
make germline_resource work again
FriederikeHanssen Jun 17, 2022
c6210fa
remove test code
FriederikeHanssen Jun 17, 2022
a72ce06
add mutect2 no intervals tests
FriederikeHanssen Jun 17, 2022
1d6e363
some indents
FriederikeHanssen Jun 17, 2022
7fd12ab
more channel :sparkles: magic
FriederikeHanssen Jun 17, 2022
17d3d72
Merge remote-tracking branch 'upstream/dev' into gatk_resource
FriederikeHanssen Jun 17, 2022
ae9eb2f
add config for mutect2 tests
FriederikeHanssen Jun 17, 2022
f8a0e12
not sure what is going with mutect2
FriederikeHanssen Jun 20, 2022
7216964
Fix getpileupsummaries output
FriederikeHanssen Jun 20, 2022
ed7c724
Update conf/modules.config
FriederikeHanssen Jun 20, 2022
975e71c
Try to request more memory
FriederikeHanssen Jun 20, 2022
26d52d4
Revert this, everything is red
FriederikeHanssen Jun 20, 2022
2aa1e2f
try .5
FriederikeHanssen Jun 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#587](https://github.com/nf-core/sarek/pull/587) - Fix issue with VEP extra files
- [#581](https://github.com/nf-core/sarek/pull/581) - `TIDDIT` is back
- [#590](https://github.com/nf-core/sarek/pull/590) - Fix empty folders during scatter/gather
- [#592](https://github.com/nf-core/sarek/pull/592) - Fix optional resources for Mutect2, GetPileupSummaries, and HaplotypeCaller: issue [#299](https://github.com/nf-core/sarek/issues/299), [#359](https://github.com/nf-core/sarek/issues/359), [#367](https://github.com/nf-core/sarek/issues/367)

### Deprecated

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
"git_sha": "169b2b96c1167f89ab07127b7057c1d90a6996c7"
},
"gatk4/getpileupsummaries": {
"git_sha": "f40cfefc0899fd6bb6adc300142ca6c3a35573ff"
"git_sha": "1ac223ad436c1410e9c16a5966274b7ca1f8d855"
},
"gatk4/haplotypecaller": {
"git_sha": "169b2b96c1167f89ab07127b7057c1d90a6996c7"
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/modules/gatk4/getpileupsummaries/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 11 additions & 12 deletions subworkflows/local/prepare_genome.nf
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ workflow PREPARE_GENOME {
TABIX_PON(pon.flatten().map{ it -> [[id:it.baseName], it] })

chr_files = chr_dir
//TODO this works, but is not pretty. I will leave this in your hands during refactoring @Maxime
if ( params.chr_dir.endsWith('tar.gz')){
UNTAR_CHR_DIR(chr_dir.map{ it -> [[id:it[0].baseName], it] })
chr_files = UNTAR_CHR_DIR.out.untar.map{ it[1] }
Expand All @@ -71,16 +70,16 @@ workflow PREPARE_GENOME {
ch_versions = ch_versions.mix(TABIX_PON.out.versions)

emit:
bwa = BWAMEM1_INDEX.out.index // path: bwa/*
bwamem2 = BWAMEM2_INDEX.out.index // path: bwamem2/*
hashtable = DRAGMAP_HASHTABLE.out.hashmap // path: dragmap/*
dbsnp_tbi = TABIX_DBSNP.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: dbsnb.vcf.gz.tbi
dict = GATK4_CREATESEQUENCEDICTIONARY.out.dict // path: genome.fasta.dict
fasta_fai = SAMTOOLS_FAIDX.out.fai.map{ meta, fai -> [fai] } // path: genome.fasta.fai
germline_resource_tbi = TABIX_GERMLINE_RESOURCE.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: germline_resource.vcf.gz.tbi
known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi
msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list
pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi
bwa = BWAMEM1_INDEX.out.index // path: bwa/*
bwamem2 = BWAMEM2_INDEX.out.index // path: bwamem2/*
hashtable = DRAGMAP_HASHTABLE.out.hashmap // path: dragmap/*
dbsnp_tbi = TABIX_DBSNP.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: dbsnb.vcf.gz.tbi
dict = GATK4_CREATESEQUENCEDICTIONARY.out.dict // path: genome.fasta.dict
fasta_fai = SAMTOOLS_FAIDX.out.fai.map{ meta, fai -> [fai] } // path: genome.fasta.fai
germline_resource_tbi = TABIX_GERMLINE_RESOURCE.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: germline_resource.vcf.gz.tbi
known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi
msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list
pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi
chr_files = chr_files
versions = ch_versions // channel: [ versions.yml ]
versions = ch_versions // channel: [ versions.yml ]
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 25 additions & 8 deletions tests/test_tools.yml
Original file line number Diff line number Diff line change
Expand Up @@ -430,10 +430,18 @@
- no_intervals
- tumor_only
- variant_calling
exit_code: 1
stdout:
contains:
- "--tools mutect2 and --no_intervals cannot be used together."
files:
- path: results/variant_calling/sample2/mutect2/sample2.vcf.gz
- path: results/variant_calling/sample2/mutect2/sample2.vcf.gz.tbi
- path: results/variant_calling/sample2/mutect2/sample2.vcf.gz.stats
- path: results/variant_calling/sample2/mutect2/sample2.contamination.table
- path: results/variant_calling/sample2/mutect2/sample2.segmentation.table
- path: results/variant_calling/sample2/mutect2/sample2.artifactprior.tar.gz
- path: results/variant_calling/sample2/mutect2/sample2.pileupsummaries.table
- path: results/variant_calling/sample2/mutect2/sample2.filtered.vcf.gz
- path: results/variant_calling/sample2/mutect2/sample2.filtered.vcf.gz.tbi
- path: results/variant_calling/sample2/mutect2/sample2.filtered.vcf.gz.filteringStats.tsv
- path: results/csv/variantcalled.csv

- name: Run variant calling on somatic sample with mutect2
command: nextflow run main.nf -profile test,tools_somatic,docker --tools mutect2 -c ./tests/nextflow.config
Expand Down Expand Up @@ -462,10 +470,19 @@
- no_intervals
- somatic
- variant_calling
exit_code: 1
stdout:
contains:
- "--tools mutect2 and --no_intervals cannot be used together."
files:
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.vcf.gz
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.vcf.gz.tbi
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.vcf.gz.stats
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.contamination.table
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.segmentation.table
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample3.pileupsummaries.table
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4.pileupsummaries.table
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.artifactprior.tar.gz
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.filtered.vcf.gz
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.filtered.vcf.gz.tbi
- path: results/variant_calling/sample4_vs_sample3/mutect2/sample4_vs_sample3.filtered.vcf.gz.filteringStats.tsv
- path: results/csv/variantcalled.csv

- name: Run variant calling on somatic sample with msisensor-pro
command: nextflow run main.nf -profile test,tools_somatic,docker --tools msisensorpro
Expand Down
40 changes: 27 additions & 13 deletions workflows/sarek.nf
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,25 @@ if (params.wes) {
}

if(params.tools && params.tools.contains('mutect2')){
if(params.no_intervals){
log.error "--tools mutect2 and --no_intervals cannot be used together.\nOne of the tools within the Mutect2 subworkflow requires intervals. They can be provided to the pipeline with --intervals. If none are provided, they will be generated from the FASTA file.\nFor more information on the Mutect2 workflow, see here: https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2.\nFor more information on GetPileupsummaries, see here: https://gatk.broadinstitute.org/hc/en-us/articles/5358860217115-GetPileupSummaries"
if(!params.pon){
log.warn("No Panel-of-normal was specified for Mutect2.\nIt is highly recommended to use one: https://gatk.broadinstitute.org/hc/en-us/articles/5358911630107-Mutect2\nFor more information on how to create one: https://gatk.broadinstitute.org/hc/en-us/articles/5358921041947-CreateSomaticPanelOfNormals-BETA-")
}
if(!params.germline_resource){
log.warn("If Mutect2 is specified without a germline resource, no filtering will be done.\nIt is recommended to use one: https://gatk.broadinstitute.org/hc/en-us/articles/5358911630107-Mutect2")
}
if(params.pon && params.pon.contains("/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz")){
log.warn("The default Panel-of-Normals provided by GATK is used for Mutect2.\nIt is highly recommended to generate one from normal samples that are technical similar to the tumor ones.\nFor more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON-")
}
}

if(!params.dbsnp && !params.known_indels){
if(!params.skip_tools || params.skip_tools && !params.skip_tools.contains('baserecalibrator')){
log.error "Base quality score recalibration requires at least one resource file. Please provide at least one of `--dbsnp` or `--known_indels`\nYou can skip this step in the workflow by adding `--skip_tools baserecalibrator` to the command."
exit 1
}
if(params.tools && params.tools.contains('haplotypecaller')){
log.warn "If Haplotypecaller is specified, without `--dbsnp` or `--known_indels no filtering will be done. For filtering, please provide at least one of `--dbsnp` or `--known_indels`.\nFor more information see FilterVariantTranches (single-sample, default): https://gatk.broadinstitute.org/hc/en-us/articles/5358928898971-FilterVariantTranches\nFor more information see VariantRecalibration (--joint_germline): https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator\nFor more information on GATK Best practice germline variant calling: https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-"
}
}

// Save AWS IGenomes file containing annotation version
Expand All @@ -79,15 +94,16 @@ if (anno_readme && file(anno_readme).exists()) {
*/

// Initialize file channels based on params, defined in the params.genomes[params.genome] scope
chr_dir = params.chr_dir ? Channel.fromPath(params.chr_dir).collect() : []
dbsnp = params.dbsnp ? Channel.fromPath(params.dbsnp).collect() : Channel.empty()
chr_dir = params.chr_dir ? Channel.fromPath(params.chr_dir).collect() : Channel.value([])
dbsnp = params.dbsnp ? Channel.fromPath(params.dbsnp).collect() : Channel.value([])
fasta = params.fasta ? Channel.fromPath(params.fasta).collect() : Channel.empty()
fasta_fai = params.fasta_fai ? Channel.fromPath(params.fasta_fai).collect() : Channel.empty()
germline_resource = params.germline_resource ? Channel.fromPath(params.germline_resource).collect() : Channel.empty()
known_indels = params.known_indels ? Channel.fromPath(params.known_indels).collect() : Channel.empty()
loci = params.ac_loci ? Channel.fromPath(params.ac_loci).collect() : []
loci_gc = params.ac_loci_gc ? Channel.fromPath(params.ac_loci_gc).collect() : []
mappability = params.mappability ? Channel.fromPath(params.mappability).collect() : []
germline_resource = params.germline_resource ? Channel.fromPath(params.germline_resource).collect() : Channel.value([]) //Mutec2 does not require a germline resource, so set to optional input
known_indels = params.known_indels ? Channel.fromPath(params.known_indels).collect() : Channel.value([])
loci = params.ac_loci ? Channel.fromPath(params.ac_loci).collect() : Channel.value([])
loci_gc = params.ac_loci_gc ? Channel.fromPath(params.ac_loci_gc).collect() : Channel.value([])
mappability = params.mappability ? Channel.fromPath(params.mappability).collect() : Channel.value([])
pon = params.pon ? Channel.fromPath(params.pon).collect() : Channel.value([]) //PON is optional for Mutect2 (but highly recommended)

// Initialize value channels based on params, defined in the params.genomes[params.genome] scope
snpeff_db = params.snpeff_db ?: Channel.empty()
Expand All @@ -96,7 +112,6 @@ vep_genome = params.vep_genome ?: Channel.empty()
vep_species = params.vep_species ?: Channel.empty()

// Initialize files channels based on params, not defined within the params.genomes[params.genome] scope
pon = params.pon ? Channel.fromPath(params.pon).collect() : Channel.empty()
snpeff_cache = params.snpeff_cache ? Channel.fromPath(params.snpeff_cache).collect() : []
vep_cache = params.vep_cache ? Channel.fromPath(params.vep_cache).collect() : []

Expand Down Expand Up @@ -246,9 +261,9 @@ workflow SAREK {
dragmap = params.fasta ? params.dragmap ? Channel.fromPath(params.dragmap).collect() : PREPARE_GENOME.out.hashtable : []
dict = params.fasta ? params.dict ? Channel.fromPath(params.dict).collect() : PREPARE_GENOME.out.dict : []
fasta_fai = params.fasta ? params.fasta_fai ? Channel.fromPath(params.fasta_fai).collect() : PREPARE_GENOME.out.fasta_fai : []
dbsnp_tbi = params.dbsnp ? params.dbsnp_tbi ? Channel.fromPath(params.dbsnp_tbi).collect() : PREPARE_GENOME.out.dbsnp_tbi : Channel.empty()
dbsnp_tbi = params.dbsnp ? params.dbsnp_tbi ? Channel.fromPath(params.dbsnp_tbi).collect() : PREPARE_GENOME.out.dbsnp_tbi : Channel.value([])
germline_resource_tbi = params.germline_resource ? params.germline_resource_tbi ? Channel.fromPath(params.germline_resource_tbi).collect() : PREPARE_GENOME.out.germline_resource_tbi : []
known_indels_tbi = params.known_indels ? params.known_indels_tbi ? Channel.fromPath(params.known_indels_tbi).collect() : PREPARE_GENOME.out.known_indels_tbi : Channel.empty()
known_indels_tbi = params.known_indels ? params.known_indels_tbi ? Channel.fromPath(params.known_indels_tbi).collect() : PREPARE_GENOME.out.known_indels_tbi : Channel.value([])
pon_tbi = params.pon ? params.pon_tbi ? Channel.fromPath(params.pon_tbi).collect() : PREPARE_GENOME.out.pon_tbi : []
msisensorpro_scan = PREPARE_GENOME.out.msisensorpro_scan

Expand All @@ -259,7 +274,6 @@ workflow SAREK {

// known_sites is made by grouping both the dbsnp and the known indels ressources
// Which can either or both be optional
// Actually BQSR has been throwing erros if no sides were provided so it must be at least one
known_sites = dbsnp.concat(known_indels).collect()
known_sites_tbi = dbsnp_tbi.concat(known_indels_tbi).collect()

Expand Down