Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dnascope module and subworkflow in Sarek #1193

Merged
merged 56 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
72bb50b
WIP: Implementing Dnascope module and subworkflow in Sarek.
asp8200 Aug 28, 2023
c81aeb5
WIP: Changing sentieon_dnascope_model from string to file-channel
asp8200 Aug 28, 2023
4dcdc9c
align
asp8200 Aug 28, 2023
cec591b
Changing Sarek-option sentieon_dnascope_pcr_based to sentieon_dnascop…
asp8200 Aug 30, 2023
b2fbd53
Adding meta to some input-channels for Dnascope module
asp8200 Aug 30, 2023
c12ce59
Updated dnascope
asp8200 Sep 6, 2023
47ec913
Adding option dnascope_filter for skip_tools. Will be removed later #…
asp8200 Sep 6, 2023
79bf941
Add variant-filter for DNASCOPE
asp8200 Sep 6, 2023
5aa33fb
Trying to order stuff alphabetically
asp8200 Sep 6, 2023
d8a06bd
calling BAM_JOINT_CALLING_GERMLINE_SENTIEON for joint-germline varian…
asp8200 Sep 6, 2023
56f65d6
Skipping Sentieon VarCal and ApplyVarCal for joint-germline variant-c…
asp8200 Sep 7, 2023
7da3dec
Merge branch 'dev' into sentieon_dnascope
asp8200 Sep 7, 2023
01562e7
Adding sentieon Dnascope to some error msgs and warnings
asp8200 Sep 7, 2023
f30048d
resolving conflicts from merging dev
asp8200 Sep 13, 2023
0f4080c
params.joint_germline was unintentionally out-commented
asp8200 Sep 13, 2023
cc72786
Continued implementing Sentieon Dnascope
asp8200 Sep 14, 2023
5cf6520
Fixing name of pytest. Replacing problematic ampersand character
asp8200 Sep 14, 2023
ccdf4e5
Bug fix: Sent tbi to vcf-channel and vcf to tbi-channel
asp8200 Sep 14, 2023
d74ec55
Adding a bit of error-handling for sentieon-based joint-germline vari…
asp8200 Sep 14, 2023
a367e37
Separate configs for sentieon_dnascope_joint_germline and sentieon_ha…
asp8200 Sep 14, 2023
f55c22e
Adding tests for dnascope (excel joint-germline)
asp8200 Sep 14, 2023
a10ad13
Adding meta tags patient and variantcaller to channels genotype_vcf a…
asp8200 Sep 14, 2023
6c78764
joint germline with dnascope
asp8200 Sep 14, 2023
c48f68c
test_sentieon_joint_germline.yml -> test_sentieon_haplotyper_joint_ge…
asp8200 Sep 14, 2023
af59cd3
Adding test for joint-germline with dnascope
asp8200 Sep 14, 2023
d0bb50d
test_sentieon_joint_germline.yml -> test_sentieon_haplotyper_joint_g…
asp8200 Sep 14, 2023
a063f3f
initializing output channel gvcf_sentieon_dnascope
asp8200 Sep 14, 2023
2902b5b
secrets not secret
asp8200 Sep 14, 2023
8d13939
include configs in alphabetical order
asp8200 Sep 15, 2023
c2cd588
Adding tag for sentieon/dnascope (excl joint-germline)
asp8200 Sep 15, 2023
de09b05
update tags
asp8200 Sep 15, 2023
704fd80
Adding tag sentieon_dnascope_joint_germline
asp8200 Sep 17, 2023
e46f6d9
TABIX_KNOWN_INDELS and TABIX_KNOWN_SNPS is probably not needed in the…
asp8200 Sep 17, 2023
57011af
Resolve conflicts
asp8200 Sep 17, 2023
1e424a6
Re-installing config for TABIX_KNOWN_INDELS for dnascope
asp8200 Sep 17, 2023
2738c8d
Briefly mentioning DnaScope
asp8200 Sep 18, 2023
0479416
Publishing from SENTIEON_DNAMODELAPPLY
asp8200 Sep 18, 2023
ba06cac
Replacing VCF_VARIANT_FILTERING_GATK with SENTIEON_DNAMODELAPPLY in D…
asp8200 Sep 18, 2023
89591c7
New sentieon module dnamodelapply
asp8200 Sep 18, 2023
1f51e30
Adding tests of dnascope skipping DnaModelApply
asp8200 Sep 19, 2023
801b8eb
Adding sentieon_dnascope_skip_filter
asp8200 Sep 20, 2023
9ccad29
Just triggering tests
asp8200 Sep 20, 2023
a26cb2d
Merge branch 'dev' into sentieon_dnascope
asp8200 Sep 20, 2023
eba764c
Adding profile software_license and option sentieon_extension to pyte…
asp8200 Sep 20, 2023
ef3997d
Fixing prefix for SENTIEON_DNAMODELAPPLY
asp8200 Sep 20, 2023
a4d7dd2
Updating md5sums
asp8200 Sep 20, 2023
a7436c5
Removing comment
asp8200 Sep 20, 2023
235d784
Updating changelog with PR for Sentieon DnaScope
asp8200 Sep 20, 2023
5cb89c1
Adding test of Sentieon VQSR not running for Sentieon DnaScope
asp8200 Sep 20, 2023
45e961a
prettier
asp8200 Sep 20, 2023
c1c7c6e
Adding info concerning DNAscope. Also other minor improvements to the…
asp8200 Sep 20, 2023
cfeb762
Removing redundant test
asp8200 Sep 20, 2023
9579af4
Resolving merge conflict and renaming some sentieon subsections
asp8200 Oct 1, 2023
31fc80a
Trying to sort out section and subsection for Sentieon/haplotyper and…
asp8200 Oct 2, 2023
46f0788
Merge branch 'dev' into sentieon_dnascope
asp8200 Oct 4, 2023
d6f610c
Moving PR1193 to dev-section since the PR was not merged for Sarek 3.3.2
asp8200 Oct 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- [#1193](https://github.com/nf-core/sarek/pull/1193) - Adding support for Sentieon's DnaScope for germline variant-calling including joint-germline.
- [#1271](https://github.com/nf-core/sarek/pull/1271) - Back to dev

### Changed
Expand Down
6 changes: 3 additions & 3 deletions conf/modules/prepare_genome.config
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ process {
}

withName: 'TABIX_DBSNP' {
ext.when = { !params.dbsnp_tbi && params.dbsnp && ((params.step == "mapping" || params.step == "markduplicates" || params.step == "prepare_recalibration") || params.tools && (params.tools.split(',').contains('controlfreec') || params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper') || params.tools.split(',').contains('mutect2'))) }
ext.when = { !params.dbsnp_tbi && params.dbsnp && ((params.step == "mapping" || params.step == "markduplicates" || params.step == "prepare_recalibration") || params.tools && (params.tools.split(',').contains('controlfreec') || params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper') || params.tools.split(',').contains('sentieon_dnascope') || params.tools.split(',').contains('mutect2'))) }
publishDir = [
enabled: (params.save_reference || params.build_only_index),
mode: params.publish_dir_mode,
Expand All @@ -96,7 +96,7 @@ process {
}

withName: 'TABIX_KNOWN_INDELS' {
ext.when = { !params.known_indels_tbi && params.known_indels && (params.step == 'mapping' || params.step == "markduplicates" || params.step == 'prepare_recalibration' || (params.tools && (params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper'))) ) }
ext.when = { !params.known_indels_tbi && params.known_indels && (params.step == 'mapping' || params.step == "markduplicates" || params.step == 'prepare_recalibration' || (params.tools && (params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper') || params.tools.split(',').contains('sentieon_dnascope'))) ) }
publishDir = [
enabled: (params.save_reference || params.build_only_index),
mode: params.publish_dir_mode,
Expand All @@ -106,7 +106,7 @@ process {
}

withName: 'TABIX_KNOWN_SNPS' {
ext.when = { !params.known_snps_tbi && params.known_snps && (params.step == 'mapping' || params.step == "markduplicates" || params.step == 'prepare_recalibration' || (params.tools && (params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper'))) ) }
ext.when = { !params.known_snps_tbi && params.known_snps && (params.step == 'mapping' || params.step == "markduplicates" || params.step == 'prepare_recalibration' || (params.tools && (params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper') )) ) }
publishDir = [
enabled: (params.save_reference || params.build_only_index),
mode: params.publish_dir_mode,
Expand Down
68 changes: 68 additions & 0 deletions conf/modules/sentieon_dnascope.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = When to run the module.
----------------------------------------------------------------------------------------
*/

// SENTIEON DNASCOPE

process {

withName: 'SENTIEON_DNASCOPE' {
ext.prefix = { meta.num_intervals <= 1 ? "${meta.id}.dnascope" : "${meta.id}.dnascope.${intervals.simpleName}" }
ext.when = { params.tools && params.tools.split(',').contains('sentieon_dnascope') }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/"},
pattern: "*{vcf.gz,vcf.gz.tbi}",
saveAs: { meta.num_intervals > 1 ? null : "sentieon_dnascope/${meta.id}/${it}" }
]
}

withName: 'MERGE_SENTIEON_DNASCOPE_VCFS' {
ext.prefix = { params.joint_germline ? "${meta.id}.dnascope.g" : "${meta.id}.dnascope.unfiltered" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_dnascope/${meta.id}/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'MERGE_SENTIEON_DNASCOPE_GVCFS' {
ext.prefix = { "${meta.id}.dnascope.g" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_dnascope/${meta.id}/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

if (params.tools && params.tools.contains('sentieon_dnascope')) {
withName: '.*FILTERVARIANTTRANCHES' {
ext.prefix = {"${meta.id}.dnascope"}
ext.args = { "--info-key CNN_1D" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_dnascope/${meta.id}/"},
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
}

withName: 'SENTIEON_DNAMODELAPPLY' {
ext.prefix = {"${meta.id}.dnascope.filtered"}
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_dnascope/${meta.id}/"},
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}

}
45 changes: 45 additions & 0 deletions conf/modules/sentieon_dnascope_joint_germline.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = When to run the module.
----------------------------------------------------------------------------------------
*/

// SENTIEON DNASCOPE JOINT_GERMLINE

process {

// TO-DO: duplicate!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need an issue for this todo?

withName: 'SENTIEON_GVCFTYPER' {
ext.args = { "--allow-old-rms-mapping-quality-annotation-data" }
ext.prefix = { meta.intervals_name }
publishDir = [
enabled: false
]
}

if (params.tools && params.tools.contains('sentieon_dnascope') && params.joint_germline) {
withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:BCFTOOLS_SORT' {
ext.prefix = { vcf.baseName - ".vcf" + ".sort" }
publishDir = [
enabled: false
]
}

withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:MERGE_GENOTYPEGVCFS' {
ext.prefix = "joint_germline"
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_dnascope/joint_variant_calling/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
}
}
59 changes: 54 additions & 5 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,10 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [GATK Germline Single Sample Variant Calling](#gatk-germline-single-sample-variant-calling)
- [GATK Joint Germline Variant Calling](#gatk-joint-germline-variant-calling)
- [GATK Mutect2](#gatk-mutect2)
- [Sentieon DNAscope](#sentieon-dnascope)
- [Sentieon DNAscope joint germline variant calling](#sentieon-dnascope-joint-germline-variant-calling)
- [Sentieon Haplotyper](#sentieon-haplotyper)
- [Sentieon Joint Germline Variant Calling](#sentieon-joint-germline-variant-calling)
- [Sentieon Haplotyper joint germline variant calling](#sentieon-haplotyper-joint-germline-variant-calling)
- [Strelka2](#strelka2)
- [Structural Variants](#structural-variants)
- [Manta](#manta)
Expand Down Expand Up @@ -442,6 +444,53 @@ Files created:

</details>

#### Sentieon DNAscope

[Sentieon DNAscope](https://support.sentieon.com/appnotes/dnascope_ml/#dnascope-germline-variant-calling-with-a-machine-learning-model) is a variant-caller which aims at outperforming GATK's Haplotypecaller in terms of both speed and accuracy. DNAscope allows you to use a machine learning model to perform variant calling with higher accuracy by improving the candidate detection and filtering.

<details markdown="1">
<summary>Unfiltered VCF-files for normal samples</summary>

**Output directory: `{outdir}/variantcalling/sentieon_dnascope/<sample>/`**

- `<sample>.dnascope.unfiltered.vcf.gz` and `<sample>.dnascope.unfiltered.vcf.gz.tbi`
- VCF with tabix index

</details>

The output from Sentieon's DNAscope can be controlled through the option `--sentieon_dnascope_emit_mode` for Sarek, see [Basic usage of Sentieon functions](#basic-usage-of-sentieon-functions).

Unless `dnascope_filter` is listed under `--skip_tools` in the nextflow command, Sentieon's [DNAModelApply](https://support.sentieon.com/manual/usages/general/#dnamodelapply-algorithm) is applied to the unfiltered VCF-files in order to obtain filtered VCF-files.

<details markdown="1">
<summary>Filtered VCF-files for normal samples</summary>

**Output directory: `{outdir}/variantcalling/sentieon_dnascope/<sample>/`**

- `<sample>.dnascope.filtered.vcf.gz` and `<sample>.dnascope.filtered.vcf.gz.tbi`
- VCF with tabix index

</details>

##### Sentieon DNAscope joint germline variant calling

In Sentieon's package DNAscope, joint germline variant calling is done by first running Sentieon's Dnacope in emit-mode `gvcf` for each sample and then running Sentieon's [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) on the set of gVCF-files. See [Basic usage of Sentieon functions](#basic-usage-of-sentieon-functions) for information on how joint germline variant calling can be done in Sarek using Sentieon's DNAscope.

<details markdown="1">
<summary>Output files from joint germline variant calling</summary>

**Output directory: `{outdir}/variantcalling/sentieon_dnascope/<sample>/`**

- `<sample>.dnascope.g.vcf.gz` and `<sample>.dnascope.g.vcf.gz.tbi`
- VCF with tabix index

**Output directory: `{outdir}/variantcalling/sentieon_dnascope/joint_variant_calling/`**

- `joint_germline.vcf.gz` and `joint_germline.vcf.gz.tbi`
- VCF with tabix index

</details>

#### Sentieon Haplotyper
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

[Sentieon Haplotyper](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm) is Sention's speedup version of GATK's Haplotypecaller (see above).
Expand All @@ -456,7 +505,7 @@ Files created:

</details>

The output from Sentieon's Haplotyper can be controlled through the option `--sentieon_haplotyper_emit_mode` for Sarek, see [Basic usage of Sentieon functions in Sarek](#basic-usage-of-sentieon-functions-in-sarek).
The output from Sentieon's Haplotyper can be controlled through the option `--sentieon_haplotyper_emit_mode` for Sarek, see [Basic usage of Sentieon functions](#basic-usage-of-sentieon-functions).

Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow command, GATK's CNNScoreVariants and FilterVariantTranches (see above) is applied to the unfiltered VCF-files in order to obtain filtered VCF-files.

Expand All @@ -470,16 +519,16 @@ Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow comman

</details>

##### Sentieon Joint Germline Variant Calling
##### Sentieon Haplotyper joint germline variant calling

In Sentieon's package DNAseq, joint germline variant calling is done by first running Sentieon's Haplotyper in emit-mode `gvcf` for each sample and then running Sentieon's [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) on the set of gVCF-files. See [Basic usage of Sentieon functions in Sarek](#basic-usage-of-sentieon-functions-in-sarek) for information on how joint germline variant calling can be done in Sarek using Sentieon's DNAseq. After joint genotyping, Sentieon's version of VQSR ([VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) and [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm)) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity.
In Sentieon's package DNAseq, joint germline variant calling is done by first running Sentieon's Haplotyper in emit-mode `gvcf` for each sample and then running Sentieon's [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) on the set of gVCF-files. See [Basic usage of Sentieon functions](#basic-usage-of-sentieon-functions) for information on how joint germline variant calling can be done in Sarek using Sentieon's DNAseq. After joint genotyping, Sentieon's version of VQSR ([VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) and [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm)) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity.

<details markdown="1">
<summary>Output files from joint germline variant calling</summary>

**Output directory: `{outdir}/variantcalling/sentieon_haplotyper/<sample>/`**

- `<sample>.haplotypecaller.g.vcf.gz` and `<sample>.haplotypecaller.g.vcf.gz.tbi`
- `<sample>.haplotyper.g.vcf.gz` and `<sample>.haplotyper.g.vcf.gz.tbi`
- VCF with tabix index

**Output directory: `{outdir}/variantcalling/sentieon_haplotyper/joint_variant_calling/`**
Expand Down
Loading