Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatkcnvcaller #362

Merged
merged 48 commits into from
Jul 7, 2023
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
e9cfd84
Add Gatk4 GermlineCNVCaller module
ryanjameskennedy Nov 17, 2022
fe10e4e
Merge branch 'dev' into add-gatk4-germlinecnvcaller
ryanjameskennedy Mar 8, 2023
120a674
GATK nightly bug fix
ryanjameskennedy Mar 8, 2023
edca36c
Merge dev changes into add-gatk4-germlinecnvcaller
ryanjameskennedy Mar 8, 2023
d8c7be8
Merge branch 'nf-core:dev' into add-gatk4-germlinecnvcaller
ryanjameskennedy Apr 14, 2023
58d65ba
Install nf-core modules
ryanjameskennedy Apr 12, 2023
02eaa84
Update and add configs
ryanjameskennedy Apr 12, 2023
349fc83
Update and add subworkflows
ryanjameskennedy Apr 12, 2023
85b6825
Update workflow
ryanjameskennedy Apr 12, 2023
d1a8b8e
Install & update modules
ryanjameskennedy Apr 14, 2023
a5d6de2
Fix formatting and structure
ryanjameskennedy Apr 14, 2023
5be1daa
Fix linting error
ryanjameskennedy Apr 14, 2023
3e20057
Update test_one_sample.config
ryanjameskennedy Apr 17, 2023
d18d4cd
Add gatk params to rd main.nf
ryanjameskennedy Apr 17, 2023
6083b1b
Add meta to blacklist bed
ryanjameskennedy Apr 17, 2023
d0e8820
Change germlinecnvcaller input channel
ryanjameskennedy Apr 17, 2023
343f62f
Update nextflow_schema.json
ryanjameskennedy Apr 17, 2023
34f1a25
Fix prettier linting error
ryanjameskennedy Apr 17, 2023
9a24dda
Update configs re hg38.blacklist_interval.bed
ryanjameskennedy Apr 17, 2023
f8e9910
Fix channel errors
ryanjameskennedy Apr 17, 2023
57a0dbe
Remove cohort modules
ryanjameskennedy Apr 17, 2023
710d910
Fix pipeline test errors
ryanjameskennedy Apr 17, 2023
77f0e9f
Minor test changes
ryanjameskennedy Apr 17, 2023
53bd239
Merge branch 'dev' of github.com:nf-core/raredisease into gatkcnvcaller
ramprasadn Jun 2, 2023
cf979be
Merge branch 'master' of github.com:nf-core/raredisease into gatkcnvc…
ramprasadn Jun 2, 2023
95bb1b3
fix changes
ramprasadn Jun 2, 2023
4b398cb
Merge branch 'devplaceholder' of github.com:nf-core/raredisease into …
ramprasadn Jun 2, 2023
a9d1423
Merge branch 'dev' of github.com:nf-core/raredisease into gatkcnvcaller
ramprasadn Jun 6, 2023
3ece075
Merge branch 'dev' of github.com:nf-core/raredisease into gatkcnvcaller
ramprasadn Jun 7, 2023
3bff114
update preprocessintervals
ramprasadn Jun 7, 2023
b91c958
update modules
ramprasadn Jun 28, 2023
f5a263d
Merge branch 'gatkcnvcaller' of github.com:genomic-medicine-sweden/ra…
ramprasadn Jun 28, 2023
552eb0c
Merge branch 'dev' of github.com:nf-core/raredisease into gatkcnvcaller
ramprasadn Jun 28, 2023
22135b0
rename params
ramprasadn Jun 29, 2023
494cfea
rename channels
ramprasadn Jun 29, 2023
d93c45b
update subworkflow
ramprasadn Jul 4, 2023
12d6302
prettier
ramprasadn Jul 4, 2023
bafc512
remove modules [skip ci]
ramprasadn Jul 4, 2023
20c4f0d
Merge branch 'gatkcnvcaller' of github.com:genomic-medicine-sweden/ra…
ramprasadn Jul 5, 2023
d2d019c
Merge branch 'dev' of github.com:nf-core/raredisease into gatkcnvcaller
ramprasadn Jul 5, 2023
3665ff3
update docs
ramprasadn Jul 5, 2023
6f589bb
update usage
ramprasadn Jul 5, 2023
e39caf5
update sample
ramprasadn Jul 6, 2023
d08a95f
format usage
ramprasadn Jul 6, 2023
14f8869
remove hidden
ramprasadn Jul 6, 2023
66efabc
review suggestions
ramprasadn Jul 7, 2023
2f83bbb
Merge branch 'dev' of github.com:nf-core/raredisease into gatkcnvcaller
ramprasadn Jul 7, 2023
007c5ee
Merge branch 'dev' into gatkcnvcaller
ramprasadn Jul 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- Add GATK's cnv calling pipeline [#362](https://github.com/nf-core/raredisease/pull/362)
- Add `public_aws_ecr` profile for using AWS ECR public gallery images [#360](https://github.com/nf-core/raredisease/pull/360)
- GATK's ShiftFasta to generate all the files required for mitochondrial analysis [#354](https://github.com/nf-core/raredisease/pull/354)
- Feature to calculate CADD scores for indels [#325](https://github.com/nf-core/raredisease/pull/325)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ On release, automated continuous integration tests run the pipeline on a full-si

- [Manta](https://github.com/Illumina/manta)
- [TIDDIT's sv](https://github.com/SciLifeLab/TIDDIT)
- Copy number variant calling:
- [GATK GermlineCNVCaller](https://github.com/broadinstitute/gatk)

**5. Annotation - SNV:**

Expand Down Expand Up @@ -153,8 +155,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->

If you use nf-core/raredisease for your analysis, please cite it using the following doi: [10.5281/zenodo.7995798](https://doi.org/10.5281/zenodo.7995798)

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
Expand Down
39 changes: 39 additions & 0 deletions conf/modules/call_sv_germlinecnvcaller.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = Conditional clause
----------------------------------------------------------------------------------------
*/

//
// gcnvcaller calling options
//

process {

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER.*" {
publishDir = [
enabled: false
]
ext.when = !params.skip_cnv_calling
}

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER:GATK4_COLLECTREADCOUNTS" {
ext.args = "--format TSV --interval-merging-rule OVERLAPPING_ONLY"
}

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER:GATK4_DETERMINEGERMLINECONTIGPLOIDY" {
ext.prefix = { "${meta.id}_ploidy" }
}

withName: ".*CALL_STRUCTURAL_VARIANTS:CALL_SV_GERMLINECNVCALLER:GATK4_GERMLINECNVCALLER" {
ext.args = "--run-mode CASE"
ext.prefix = { "${meta.id}_${model.simpleName}" }
}
}
11 changes: 11 additions & 0 deletions conf/modules/prepare_references.config
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,15 @@ process {
enabled: false
]
}

withName: '.*PREPARE_REFERENCES:GATK_PREPROCESS_WGS' {
ext.args = { "--padding 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext.args = { "--padding 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }
ext.args = { "--padding 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }

ext.when = { params.analysis_type.equals("wgs") && !params.readcount_intervals }
}

withName: '.*PREPARE_REFERENCES:GATK_PREPROCESS_WES' {
ext.args = { "--bin-length 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext.args = { "--bin-length 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }
ext.args = { "--bin-length 0 --interval-merging-rule OVERLAPPING_ONLY --exclude-intervals ${params.mito_name}" }

ext.when = { params.analysis_type.equals("wes") && !params.readcount_intervals }
}

}
3 changes: 3 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ params {
igenomes_ignore = true
mito_name = 'MT'

// analysis params
skip_cnv_calling = true

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/testdata/samplesheet_trio.csv'

Expand Down
3 changes: 3 additions & 0 deletions conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ params {
igenomes_ignore = true
mito_name = 'MT'

// analysis params
skip_cnv_calling = true

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/testdata/samplesheet_single.csv'

Expand Down
9 changes: 8 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

#### GATK GermlineCNVCaller (CNV calling)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### GATK GermlineCNVCaller (CNV calling)

- [Alignment](#alignment)
- [Mapping](#mapping)
- [Bwa-mem2](#bwa-mem2)
Expand All @@ -33,6 +35,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Variant calling - SV](#variant-calling---sv)
- [Manta](#manta)
- [TIDDIT sv](#tiddit-sv)
- [GATK GermlineCNVCaller - CNV calling](#gatk-germlinecnvcaller---cnv-calling)
- [SVDB merge](#svdb-merge)
- [Variant calling - repeat expansions](#variant-calling---repeat-expansions)
- [Expansion Hunter](#expansion-hunter)
Expand Down Expand Up @@ -252,9 +255,13 @@ The pipeline performs variant calling using [Sentieon DNAscope](https://support.

[TIDDIT's sv](https://github.com/SciLifeLab/TIDDIT) is used to identify chromosomal rearrangements using sequencing data. TIDDIT identifies intra and inter-chromosomal translocations, deletions, tandem-duplications and inversions, using supplementary alignments as well as discordant pairs. TIDDIT searches for discordant reads and split reads (supplementary alignments). Output vcf files are treated as intermediates and are not placed in the output folder by default.

#### GATK GermlineCNVCaller - CNV calling

[GATK GermlineCNVCaller](https://github.com/broadinstitute/gatk) is used to identify copy number variants in germline samples given their read counts and a model describing a sample's ploidy. Output vcf files are treated as intermediates and are not placed in the output folder by default.

#### SVDB merge

[SVDB merge](https://github.com/J35P312/SVDB#merge) is used to merge the variant calls from both Manta and TIDDIT. Output files are published in the output folder.
[SVDB merge](https://github.com/J35P312/SVDB#merge) is used to merge the variant calls from GATK's GermlineCNVCaller (only if skip_cnv_calling is set to false), Manta, and TIDDIT. Output files are published in the output folder.

<details markdown="1">
<summary>Output files</summary>
Expand Down
24 changes: 18 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ Table of contents:
- [3. Repeat expansions](#3-repeat-expansions)
- [4. Variant calling - SNV](#4-variant-calling---snv)
- [5. Variant calling - Structural variants](#5-variant-calling---structural-variants)
- [6. SNV annotation & Ranking](#6-snv-annotation--ranking)
- [7. SV annotation & Ranking](#7-sv-annotation--ranking)
- [8. Mitochondrial analysis](#8-mitochondrial-analysis)
- [6. Copy number variant calling](#6-copy-number-variant-calling)
- [7. SNV annotation & Ranking](#7-snv-annotation--ranking)
- [8. SV annotation & Ranking](#8-sv-annotation--ranking)
- [9. Mitochondrial analysis](#9-mitochondrial-analysis)
- [Run the pipeline](#run-the-pipeline)
- [Direct input in CLI](#direct-input-in-cli)
- [Import from a config file (recommended)](#import-from-a-config-file-recommended)
Expand Down Expand Up @@ -188,7 +189,18 @@ The mandatory and optional parameters for each category are tabulated below.
| | target_bed |
| | bwa |

##### 6. SNV annotation & Ranking
##### 6. Copy number variant calling

| Mandatory | Optional |
| ------------------------------ | ------------------------------- |
| ploidy_model<sup>1</sup> | readcount_intervals<sup>3</sup> |
| gcnvcaller_model<sup>1,2</sup> | |

<sup>1</sup> Output from steps 3 & 4 of GATK's CNV calling pipeline run in cohort mode as described [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants).<br />
<sup>2</sup> Sample file can be found [here](https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gcnvmodels.tsv) (Note the header 'models' in the sample file).<br />
<sup>3</sup> Output from step 1 of GATK's CNV calling pipeline as described [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants).<br />

##### 7. SNV annotation & Ranking

| Mandatory | Optional |
| ----------------------------- | ------------------------------ |
Expand All @@ -215,7 +227,7 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl

> NB: We use CADD only to annotate small indels. To annotate SNVs with precomputed CADD scores, pass the file containing CADD scores as a resource to vcfanno instead. Files containing the precomputed CADD scores for SNVs can be downloaded from [here](https://cadd.gs.washington.edu/download) (description: "All possible SNVs of GRCh3<7/8>/hg3<7/8>")

##### 7. SV annotation & Ranking
##### 8. SV annotation & Ranking

| Mandatory | Optional |
| -------------------------- | ------------------ |
Expand All @@ -227,7 +239,7 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl

<sup>1</sup> A CSV file that describes the databases (VCFs) used by SVDB for annotating structural variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/svdb_querydb_files.csv). Information about the column headers can be found [here](https://github.com/J35P312/SVDB#Query).

##### 8. Mitochondrial analysis
##### 9. Mitochondrial analysis

| Mandatory | Optional |
| ----------------- | -------- |
Expand Down
11 changes: 7 additions & 4 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ params.bwa = WorkflowMain.getGenomeAttribute(params,
params.bwamem2 = WorkflowMain.getGenomeAttribute(params, 'bwamem2')
params.call_interval = WorkflowMain.getGenomeAttribute(params, 'call_interval')
params.cadd_resources = WorkflowMain.getGenomeAttribute(params, 'cadd_resources')
params.gcnvcaller_model = WorkflowMain.getGenomeAttribute(params, 'gcnvcaller_model')
params.gens_interval_list = WorkflowMain.getGenomeAttribute(params, 'gens_interval_list')
params.gens_pon = WorkflowMain.getGenomeAttribute(params, 'gens_pon')
params.gens_gnomad_pos = WorkflowMain.getGenomeAttribute(params, 'gens_gnomad_pos')
params.gnomad_af = WorkflowMain.getGenomeAttribute(params, 'gnomad_af')
params.gnomad_af_idx = WorkflowMain.getGenomeAttribute(params, 'gnomad_af_idx')
params.intervals_wgs = WorkflowMain.getGenomeAttribute(params, 'intervals_wgs')
Expand All @@ -33,22 +37,21 @@ params.known_indels = WorkflowMain.getGenomeAttribute(params,
params.known_mills = WorkflowMain.getGenomeAttribute(params, 'known_mills')
params.ml_model = WorkflowMain.getGenomeAttribute(params, 'ml_model')
params.mt_fasta = WorkflowMain.getGenomeAttribute(params, 'mt_fasta')
params.ploidy_model = WorkflowMain.getGenomeAttribute(params, 'ploidy_model')
params.reduced_penetrance = WorkflowMain.getGenomeAttribute(params, 'reduced_penetrance')
params.readcount_intervals = WorkflowMain.getGenomeAttribute(params, 'readcount_intervals')
params.sequence_dictionary = WorkflowMain.getGenomeAttribute(params, 'sequence_dictionary')
params.score_config_snv = WorkflowMain.getGenomeAttribute(params, 'score_config_snv')
params.score_config_sv = WorkflowMain.getGenomeAttribute(params, 'score_config_sv')
params.target_bed = WorkflowMain.getGenomeAttribute(params, 'target_bed')
params.svdb_query_dbs = WorkflowMain.getGenomeAttribute(params, 'svdb_query_dbs')
params.target_bed = WorkflowMain.getGenomeAttribute(params, 'target_bed')
params.variant_catalog = WorkflowMain.getGenomeAttribute(params, 'variant_catalog')
params.vep_filters = WorkflowMain.getGenomeAttribute(params, 'vep_filters')
params.vcfanno_resources = WorkflowMain.getGenomeAttribute(params, 'vcfanno_resources')
params.vcfanno_toml = WorkflowMain.getGenomeAttribute(params, 'vcfanno_toml')
params.vcfanno_lua = WorkflowMain.getGenomeAttribute(params, 'vcfanno_lua')
params.vep_cache = WorkflowMain.getGenomeAttribute(params, 'vep_cache')
params.vep_cache_version = WorkflowMain.getGenomeAttribute(params, 'vep_cache_version')
params.gens_interval_list = WorkflowMain.getGenomeAttribute(params, 'gens_interval_list')
params.gens_pon = WorkflowMain.getGenomeAttribute(params, 'gens_pon')
params.gens_gnomad_pos = WorkflowMain.getGenomeAttribute(params, 'gens_gnomad_pos')

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
25 changes: 25 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -105,16 +105,31 @@
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"installed_by": ["modules"]
},
"gatk4/collectreadcounts": {
"branch": "master",
"git_sha": "d25bf48327e86a7f737047a57ec264b90e22ce3d",
"installed_by": ["modules"]
},
"gatk4/createsequencedictionary": {
"branch": "master",
"git_sha": "541811d779026c5d395925895fa5ed35e7216cc0",
"installed_by": ["modules"]
},
"gatk4/determinegermlinecontigploidy": {
"branch": "master",
"git_sha": "d25bf48327e86a7f737047a57ec264b90e22ce3d",
"installed_by": ["modules"]
},
"gatk4/filtermutectcalls": {
"branch": "master",
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"installed_by": ["modules"]
},
"gatk4/germlinecnvcaller": {
"branch": "master",
"git_sha": "f6b848c6e1af9a9ecf4975aa8c8edad05e75e784",
"installed_by": ["modules"]
},
"gatk4/intervallisttools": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand All @@ -135,6 +150,16 @@
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"installed_by": ["modules"]
},
"gatk4/postprocessgermlinecnvcalls": {
"branch": "master",
"git_sha": "39ca55cc30514169f8420162bafe4ecf673f4b9a",
"installed_by": ["modules"]
},
"gatk4/preprocessintervals": {
"branch": "master",
"git_sha": "1226419498a14d17f98d12d6488d333b0dbd0418",
"installed_by": ["modules"]
},
"gatk4/printreads": {
"branch": "master",
"git_sha": "541811d779026c5d395925895fa5ed35e7216cc0",
Expand Down
68 changes: 68 additions & 0 deletions modules/nf-core/gatk4/collectreadcounts/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading