Skip to content

Commit

Permalink
Merge pull request #510 from genomic-medicine-sweden/feat/twovarconseq
Browse files Browse the repository at this point in the history
Add parameter to supply variant consequence files
  • Loading branch information
ramprasadn authored Feb 5, 2024
2 parents 561bdcc + c7fe536 commit bc1c856
Show file tree
Hide file tree
Showing 8 changed files with 61 additions and 74 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- New workflow for annotating mobile elements [#483](https://github.com/nf-core/raredisease/pull/483)
- Added a functionality to subsample mitochondrial alignment, and a new parameter `skip_mt_subsample` to skip the subworkflow [#508](https://github.com/nf-core/raredisease/pull/508).
- Chromograph to plot coverage across chromosomes [#507](https://github.com/nf-core/raredisease/pull/507)
- Added two new parameters `variant_consequences_snv` and `variant_consequences_sv` to supply variant consequence files for annotating SNVs and SVs. [#509](https://github.com/nf-core/raredisease/pull/509)

### `Changed`

Expand Down
41 changes: 0 additions & 41 deletions assets/variant_consequences_v2.txt

This file was deleted.

4 changes: 3 additions & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ params {
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_svdb_annotations = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/svdb_querydb_files.csv"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand All @@ -55,6 +55,8 @@ params {
vcfanno_lua = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_functions.lua"
vcfanno_resources = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_resources.txt"
vcfanno_toml = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_config.toml"
variant_consequences_snv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
variant_consequences_sv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
vep_cache = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz"
vep_filters = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/hgnc.txt"
vep_cache_version = 107
Expand Down
4 changes: 3 additions & 1 deletion conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ params {
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
mobile_element_svdb_annotations = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/svdb_querydb_files.csv"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand All @@ -55,6 +55,8 @@ params {
vcfanno_lua = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_functions.lua"
vcfanno_resources = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_resources.txt"
vcfanno_toml = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vcfanno_config.toml"
variant_consequences_snv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
variant_consequences_sv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/variant_consequences_v2.txt"
vep_cache = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz"
vep_filters = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/hgnc.txt"
vep_cache_version = 107
Expand Down
50 changes: 28 additions & 22 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,15 +221,16 @@ The mandatory and optional parameters for each category are tabulated below.

##### 7. SNV annotation & Ranking

| Mandatory | Optional |
| ----------------------------- | ------------------------------ |
| genome<sup>1</sup> | reduced_penetrance<sup>7</sup> |
| vcfanno_resources<sup>2</sup> | vcfanno_lua |
| vcfanno_toml<sup>3</sup> | vep_filters<sup>8</sup> |
| vep_cache_version | cadd_resources<sup>9</sup> |
| vep_cache<sup>4</sup> | vep_plugin_files<sup>10</sup> |
| gnomad_af<sup>5</sup> | |
| score_config_snv<sup>6</sup> | |
| Mandatory | Optional |
| ------------------------------------ | ------------------------------ |
| genome<sup>1</sup> | reduced_penetrance<sup>8</sup> |
| vcfanno_resources<sup>2</sup> | vcfanno_lua |
| vcfanno_toml<sup>3</sup> | vep_filters<sup>9</sup> |
| vep_cache_version | cadd_resources<sup>10</sup> |
| vep_cache<sup>4</sup> | vep_plugin_files<sup>11</sup> |
| gnomad_af<sup>5</sup> | |
| score_config_snv<sup>6</sup> | |
| variant_consequences_snv<sup>7</sup> | |

<sup>1</sup>Genome version is used by VEP. You have the option to choose between GRCh37 and GRCh38.<br />
<sup>2</sup>Path to VCF files and their indices used by vcfanno. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/vcfanno_resources.txt).<br />
Expand All @@ -240,10 +241,11 @@ See example cache [here](https://raw.githubusercontent.com/nf-core/test-datasets
<sup>5</sup> GnomAD VCF files can be downloaded from [here](https://gnomad.broadinstitute.org/downloads). The option `gnomad_af` expects a tab-delimited file with
no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/gnomad_reformated.tab.gz).<br />
<sup>6</sup>Used by GENMOD for ranking the variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/rank_model_snv.ini).<br />
<sup>7</sup>Used by GENMOD while modeling the variants. Contains a list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv).<br />
<sup>8</sup> This file contains a list of candidate genes (with [HGNC](https://www.genenames.org/) IDs) that is used to split the variants into canditate variants and research variants. Research variants contain all the variants, while candidate variants are a subset of research variants and are associated with candidate genes. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/hgnc.txt). Not required if --skip_vep_filter is set to true.<br />
<sup>9</sup>Path to a folder containing cadd annotations. Equivalent of the data/annotations/ folder described [here](https://github.com/kircherlab/CADD-scripts/#manual-installation), and it is used to calculate CADD scores for small indels. <br />
<sup>10</sup>A CSV file that describes the files used by VEP's named and custom plugins. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/vep_files.csv). <br />
<sup>7</sup>File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic and mitochondrial SNVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/variant_consequences_v2.txt). You can learn more about these terms [here](https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html).
<sup>8</sup>Used by GENMOD while modeling the variants. Contains a list of loci that show [reduced penetrance](https://medlineplus.gov/genetics/understanding/inheritance/penetranceexpressivity/) in people. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/reduced_penetrance.tsv).<br />
<sup>9</sup> This file contains a list of candidate genes (with [HGNC](https://www.genenames.org/) IDs) that is used to split the variants into canditate variants and research variants. Research variants contain all the variants, while candidate variants are a subset of research variants and are associated with candidate genes. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/hgnc.txt). Not required if --skip_vep_filter is set to true.<br />
<sup>10</sup>Path to a folder containing cadd annotations. Equivalent of the data/annotations/ folder described [here](https://github.com/kircherlab/CADD-scripts/#manual-installation), and it is used to calculate CADD scores for small indels. <br />
<sup>11</sup>A CSV file that describes the files used by VEP's named and custom plugins. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/vep_files.csv). <br />

> NB: We use CADD only to annotate small indels. To annotate SNVs with precomputed CADD scores, pass the file containing CADD scores as a resource to vcfanno instead. Files containing the precomputed CADD scores for SNVs can be downloaded from [here](https://cadd.gs.washington.edu/download) (description: "All possible SNVs of GRCh3<7/8>/hg3<7/8>")
Expand All @@ -256,20 +258,23 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl
| vep_cache_version | vep_filters |
| vep_cache | vep_plugin_files |
| score_config_sv | |
| variant_consequences_sv<sup>2</sup> | |

<sup>1</sup> A CSV file that describes the databases (VCFs or BEDPEs) used by SVDB for annotating structural variants. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/svdb_querydb_files.csv). Information about the column headers can be found [here](https://github.com/J35P312/SVDB#Query).
<sup>2</sup> File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic SVs. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/variant_consequences_v2.txt). You can learn more about these terms [here](https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html).

##### 9. Mitochondrial annotation

| Mandatory | Optional |
| ----------------- | ---------------- |
| genome | vep_filters |
| mito_name | vep_plugin_files |
| vcfanno_resources | |
| vcfanno_toml | |
| vep_cache_version | |
| vep_cache | |
| score_config_mt | |
| Mandatory | Optional |
| ------------------------ | ---------------- |
| genome | vep_filters |
| mito_name | vep_plugin_files |
| vcfanno_resources | |
| vcfanno_toml | |
| vep_cache_version | |
| vep_cache | |
| score_config_mt | |
| variant_consequences_snv | |

##### 10. Mobile element annotation

Expand All @@ -279,6 +284,7 @@ no header and the following columns: `CHROM POS REF_ALLELE ALT_ALLELE AF`. Sampl
| mobile_element_svdb_annotations<sup>1</sup> | |
| vep_cache_version | |
| vep_cache | |
| variant_consequences_sv | |

<sup>1</sup> A CSV file that describes the databases (VCFs) used by SVDB for annotating mobile elements with allele frequencies. Sample file [here](https://github.com/nf-core/test-datasets/blob/raredisease/reference/svdb_querydb_files.csv).

Expand Down
2 changes: 2 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ params.sdf = WorkflowMain.getGenomeAttribute(params,
params.svdb_query_dbs = WorkflowMain.getGenomeAttribute(params, 'svdb_query_dbs')
params.target_bed = WorkflowMain.getGenomeAttribute(params, 'target_bed')
params.variant_catalog = WorkflowMain.getGenomeAttribute(params, 'variant_catalog')
params.variant_consequences_snv = WorkflowMain.getGenomeAttribute(params, 'variant_consequences_snv')
params.variant_consequences_sv = WorkflowMain.getGenomeAttribute(params, 'variant_consequences_sv')
params.vep_filters = WorkflowMain.getGenomeAttribute(params, 'vep_filters')
params.vcf2cytosure_blacklist = WorkflowMain.getGenomeAttribute(params, 'vcf2cytosure_blacklist')
params.vcfanno_resources = WorkflowMain.getGenomeAttribute(params, 'vcfanno_resources')
Expand Down
12 changes: 12 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,18 @@
"fa_icon": "fas fa-user-cog",
"description": "Options used to facilitate the annotation of the variants.",
"properties": {
"variant_consequences_snv": {
"type": "string",
"description": "File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic and mitochondrial SNVs.",
"help_text": "For more information check https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html",
"fa_icon": "fas fa-file-csv"
},
"variant_consequences_sv": {
"type": "string",
"description": "File containing list of SO terms listed in the order of severity from most severe to lease severe for annotating genomic SVs.",
"help_text": "For more information check https://grch37.ensembl.org/info/genome/variation/prediction/predicted_data.html",
"fa_icon": "fas fa-file-csv"
},
"vep_cache_version": {
"type": "integer",
"default": 110,
Expand Down
21 changes: 12 additions & 9 deletions workflows/raredisease.nf
Original file line number Diff line number Diff line change
Expand Up @@ -40,19 +40,19 @@ if (params.run_rtgvcfeval) {

if (!params.skip_snv_annotation) {
mandatoryParams += ["genome", "vcfanno_resources", "vcfanno_toml", "vep_cache", "vep_cache_version",
"gnomad_af", "score_config_snv"]
"gnomad_af", "score_config_snv", "variant_consequences_snv"]
}

if (!params.skip_sv_annotation) {
mandatoryParams += ["genome", "vep_cache", "vep_cache_version", "score_config_sv"]
mandatoryParams += ["genome", "vep_cache", "vep_cache_version", "score_config_sv", "variant_consequences_sv"]
if (!params.svdb_query_bedpedbs && !params.svdb_query_dbs) {
println("params.svdb_query_bedpedbs or params.svdb_query_dbs should be set.")
missingParamsCount += 1
}
}

if (!params.skip_mt_annotation) {
mandatoryParams += ["genome", "mito_name", "vcfanno_resources", "vcfanno_toml", "vep_cache_version", "vep_cache"]
mandatoryParams += ["genome", "mito_name", "vcfanno_resources", "vcfanno_toml", "vep_cache_version", "vep_cache", "variant_consequences_snv"]
}

if (params.analysis_type.equals("wes")) {
Expand All @@ -72,7 +72,7 @@ if (!params.skip_vep_filter) {
}

if (!params.skip_me_annotation) {
mandatoryParams += ["mobile_element_svdb_annotations"]
mandatoryParams += ["mobile_element_svdb_annotations", "variant_consequences_snv"]
}

for (param in mandatoryParams.unique()) {
Expand Down Expand Up @@ -288,7 +288,10 @@ workflow RAREDISEASE {
ch_target_intervals = ch_references.target_intervals
ch_variant_catalog = params.variant_catalog ? Channel.fromPath(params.variant_catalog).map { it -> [[id:it[0].simpleName],it]}.collect()
: Channel.value([[],[]])
ch_variant_consequences = Channel.fromPath("$projectDir/assets/variant_consequences_v2.txt", checkIfExists: true).collect()
ch_variant_consequences_snv = params.variant_consequences_snv ? Channel.fromPath(params.variant_consequences_snv).collect()
: Channel.value([])
ch_variant_consequences_sv = params.variant_consequences_sv ? Channel.fromPath(params.variant_consequences_sv).collect()
: Channel.value([])
ch_vcfanno_resources = params.vcfanno_resources ? Channel.fromPath(params.vcfanno_resources).splitText().map{it -> it.trim()}.collect()
: Channel.value([])
ch_vcf2cytosure_blacklist = params.vcf2cytosure_blacklist ? Channel.fromPath(params.vcf2cytosure_blacklist).collect()
Expand Down Expand Up @@ -490,7 +493,7 @@ workflow RAREDISEASE {

ANN_CSQ_PLI_SV (
GENERATE_CLINICAL_SET_SV.out.vcf,
ch_variant_consequences
ch_variant_consequences_sv
)
ch_versions = ch_versions.mix(ANN_CSQ_PLI_SV.out.versions)

Expand Down Expand Up @@ -535,7 +538,7 @@ workflow RAREDISEASE {

ANN_CSQ_PLI_SNV (
GENERATE_CLINICAL_SET_SNV.out.vcf,
ch_variant_consequences
ch_variant_consequences_snv
)
ch_versions = ch_versions.mix(ANN_CSQ_PLI_SNV.out.versions)

Expand Down Expand Up @@ -577,7 +580,7 @@ workflow RAREDISEASE {

ANN_CSQ_PLI_MT(
GENERATE_CLINICAL_SET_MT.out.vcf,
ch_variant_consequences
ch_variant_consequences_snv
)
ch_versions = ch_versions.mix(ANN_CSQ_PLI_MT.out.versions)

Expand Down Expand Up @@ -663,7 +666,7 @@ workflow RAREDISEASE {
ch_genome_fasta,
ch_genome_dictionary,
ch_vep_cache,
ch_variant_consequences,
ch_variant_consequences_sv,
ch_vep_filters,
params.genome,
params.vep_cache_version,
Expand Down

0 comments on commit bc1c856

Please sign in to comment.