Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mobile element calling to raredisease #440

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
efddf0b
initial commit to log progress
peterpru Nov 6, 2023
46041f3
add modules retroseq call and discover
peterpru Nov 7, 2023
0bcf05c
fix linting
peterpru Nov 7, 2023
d803ed8
add subworkflow
peterpru Nov 7, 2023
f0bc78f
Merge branch 'dev' into add-retroseq-to-pipeline
jemten Dec 22, 2023
c387623
retroseq call subworkflow
jemten Jan 4, 2024
cad46c1
updating temporary ref_me paths
jemten Jan 4, 2024
0f6edbd
fixing path to retroseq docker
jemten Jan 4, 2024
bddd2fc
propagating versions
jemten Jan 4, 2024
0c0fd91
joining channels to keep them in sync
jemten Jan 4, 2024
ad90518
fix input name
jemten Jan 4, 2024
caa0029
concatenate and merge vcfs
jemten Jan 5, 2024
c6c0890
looking into joining error
jemten Jan 8, 2024
45cc753
adding missing test file
jemten Jan 9, 2024
08b7203
fix config indentation
jemten Jan 9, 2024
3be4642
fixing around with test data
jemten Jan 9, 2024
4bbb604
update readme
jemten Jan 9, 2024
e50756f
uncommenting lines
jemten Jan 9, 2024
8cacd8f
update changelog
jemten Jan 9, 2024
7fc6df5
Merge branch 'dev' into add-retroseq-to-pipeline
jemten Jan 10, 2024
9c178a1
bring back a few lines that was commented out
jemten Jan 10, 2024
9564c64
add meta2 and meta3 to retroseq_call meta.yml
jemten Jan 11, 2024
01a8313
Document order of chr in refrence genome
jemten Jan 11, 2024
d99a8d9
Merge branch 'dev' into add-retroseq-to-pipeline
jemten Jan 12, 2024
425863d
updating usage
jemten Jan 15, 2024
71a25b2
Merge branch 'dev' into add-retroseq-to-pipeline
jemten Jan 15, 2024
967b7de
Merge branch 'dev' into add-retroseq-to-pipeline
jemten Jan 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- ngsbits samplegender to check sex [#453](https://github.com/nf-core/raredisease/pull/453)
- New workflow for generating cgh files from SV vcfs for interpretation in the CytosSure interpretation software. Turned off by default [#456](https://github.com/nf-core/raredisease/pull/456/)
- Fastp to do adapter trimming. It can be skipped using `--skip_fastp` [#457](https://github.com/nf-core/raredisease/pull/457)
- New workflow for calling insertion of mobile elements [#440](https://github.com/nf-core/raredisease/pull/440)
- GATK CNVCaller uses segments instead of intervals, filters out "reference" segments between the calls, and fixes a bug with how `ch_readcount_intervals` was handled [#472](https://github.com/nf-core/raredisease/pull/472)
- bwa aligner [#474](https://github.com/nf-core/raredisease/pull/474)
- Add FOUND_IN tag, which mentions the variant caller that found the mutation, in the INFO column of the vcf files [#471](https://github.com/nf-core/raredisease/pull/471)
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,10 @@

> Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32(2):292-294. doi:10.1093/bioinformatics/btv566

- [RetroSeq](https://academic.oup.com/bioinformatics/article/29/3/389/257479)

> Thomas M. Keane, Kim Wong, David J. Adams, RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics.2013 Feb 1;29(3):389-90. doi: 10.1093/bioinformatics/bts697

- [rhocall](https://github.com/dnil/rhocall)

- [Sentieon DNAscope](https://www.biorxiv.org/content/10.1101/2022.05.20.492556v1.abstract)
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,11 @@ On release, automated continuous integration tests run the pipeline on a full-si
- [Expansion Hunter](https://github.com/Illumina/ExpansionHunter)
- [Stranger](https://github.com/Clinical-Genomics/stranger)

**9. Rank variants - SV and SNV:**
**9. Variant calling - mobile elements:**

- [RetroSeq](https://github.com/tk2/RetroSeq)

**10. Rank variants - SV and SNV:**

- [GENMOD](https://github.com/Clinical-Genomics/genmod)

Expand Down
26 changes: 26 additions & 0 deletions assets/mobile_element_references_schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/raredisease/master/assets/mobile_element_references_schema.json",
"title": "Schema for mobile_element_references",
"description": "Schema for the file provided with params.mobile_element_references",
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"exists": true,
"pattern": "^\\S+$",
"errorMessage": "Mobile element type must be provided and cannot contain spaces"
},
"path": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.bed$",
"errorMessage": "Bed file, cannot contain spaces and must have extension '.bed'"
}
},
"required": ["type", "path"]
}
}
73 changes: 73 additions & 0 deletions conf/modules/call_mobile_elements.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = Conditional clause
----------------------------------------------------------------------------------------
*/

process {

withName: '.*CALL_MOBILE_ELEMENTS:.*' {
publishDir = [
enabled: false
]
}

withName: '.*CALL_MOBILE_ELEMENTS:ME_SPLIT_ALIGNMENT' {
ext.args = { [
'--output-fmt bam',
'--fetch-pairs'
].join(' ') }
ext.args2 = { "${meta.interval}" }
ext.prefix = { "${meta.id}_${meta.interval}" }
}

withName: '.*CALL_MOBILE_ELEMENTS:RETROSEQ_DISCOVER' {
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_discover" }
}

withName: '.*CALL_MOBILE_ELEMENTS:RETROSEQ_CALL' {
ext.args = { '--soft' }
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_call" }
}

withName: '.*CALL_MOBILE_ELEMENTS:BCFTOOLS_REHEADER_ME' {
ext.args2 = { '--output-type v' }
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_reheader" }
}

withName: '.*CALL_MOBILE_ELEMENTS:BCFTOOLS_SORT_ME' {
ext.args = { '--output-type z' }
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_sort" }
}

withName: '.*CALL_MOBILE_ELEMENTS:BCFTOOLS_CONCAT_ME' {
ext.args = { '--output-type z --allow-overlaps' }
ext.prefix = { "${meta.id}_mobile_elements" }
}

withName: '.*CALL_MOBILE_ELEMENTS:SVDB_MERGE_ME' {
ext.args = { '--bnd_distance 150 --overlap 0.5' }
ext.prefix = { "${meta.id}_mobile_elements" }
publishDir = [
path: { "${params.outdir}/call_mobile_elements" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*CALL_MOBILE_ELEMENTS:TABIX_ME' {
publishDir = [
path: { "${params.outdir}/call_mobile_elements" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

}
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ params {

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta"
fai = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta.fai"
genome = 'GRCh37'
gnomad_af = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gnomad_reformated.tab.gz"
intervals_wgs = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/target_wgs.interval_list"
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand Down
2 changes: 2 additions & 0 deletions conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ params {

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta"
fai = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta.fai"
genome = 'GRCh37'
gnomad_af = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gnomad_reformated.tab.gz"
intervals_wgs = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/target_wgs.interval_list"
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand Down
2 changes: 2 additions & 0 deletions conf/test_sentieon.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,13 @@ params {

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta"
fai = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta.fai"
genome = 'GRCh37'
gnomad_af = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gnomad_reformated.tab.gz"
intervals_wgs = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/target_wgs.interval_list"
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_snv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand Down
25 changes: 13 additions & 12 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,18 +153,19 @@ The mandatory and optional parameters for each category are tabulated below.

| Mandatory | Optional |
| ------------------------------ | ------------------------------ |
| aligner<sup>1</sup> | fasta_fai<sup>3</sup> |
| fasta | bwamem2<sup>3</sup> |
| platform | bwa<sup>3</sup> |
| mito_name/mt_fasta<sup>2</sup> | known_dbsnp<sup>4</sup> |
| | known_dbsnp_tbi<sup>4</sup> |
| | min_trimmed_length<sup>5</sup> |

<sup>1</sup>Default value is bwamem2, but if you have a valid license for Sentieon, you have the option to use Sentieon as well.<br />
<sup>2</sup>f If mito_name is provided, mt_fasta can be generated by the pipeline.<br />
<sup>3</sup>fasta_fai, bwa, and bwamem2, if not provided by the user, will be generated by the pipeline when necessary.<br />
<sup>4</sup>Used only by Sentieon.<br />
<sup>5</sup>Default value is 40. Used only by fastp.<br />
| aligner<sup>1</sup> | fasta_fai<sup>4</sup> |
| fasta<sup>2</sup> | bwamem2<sup>4</sup> |
| platform | bwa<sup>4</sup> |
| mito_name/mt_fasta<sup>3</sup> | known_dbsnp<sup>5</sup> |
| | known_dbsnp_tbi<sup>5</sup> |
| | min_trimmed_length<sup>6</sup> |

<sup>1</sup>Default value is bwamem2. Other alternatives are bwa and sentieon (requires valid Sentieon license ).<br />
<sup>2</sup>Analysis set reference genome in fasta format, first 25 contigs need to be chromosome 1-22, X, Y and the mitochondria.<br />
<sup>3</sup>f If mito_name is provided, mt_fasta can be generated by the pipeline.<br />
<sup>4</sup>fasta_fai, bwa and bwamem2, if not provided by the user, will be generated by the pipeline when necessary.<br />
<sup>5</sup>Used only by Sentieon.<br />
<sup>6</sup>Default value is 40. Used only by fastp.<br />

##### 2. QC stats from the alignment files

Expand Down
2 changes: 1 addition & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ nextflow.enable.dsl = 2
GENOME PARAMETER VALUES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

params.fasta = WorkflowMain.getGenomeAttribute(params, 'fasta')
params.fai = WorkflowMain.getGenomeAttribute(params, 'fai')
params.bwa = WorkflowMain.getGenomeAttribute(params, 'bwa')
Expand All @@ -33,6 +32,7 @@ params.intervals_wgs = WorkflowMain.getGenomeAttribute(params,
params.intervals_y = WorkflowMain.getGenomeAttribute(params, 'intervals_y')
params.known_dbsnp = WorkflowMain.getGenomeAttribute(params, 'known_dbsnp')
params.known_dbsnp_tbi = WorkflowMain.getGenomeAttribute(params, 'known_dbsnp_tbi')
params.mobile_element_references = WorkflowMain.getGenomeAttribute(params, 'mobile_element_references')
params.ml_model = WorkflowMain.getGenomeAttribute(params, 'ml_model')
params.mt_fasta = WorkflowMain.getGenomeAttribute(params, 'mt_fasta')
params.ngsbits_samplegender_method = WorkflowMain.getGenomeAttribute(params, 'ngsbits_samplegender_method')
Expand Down
54 changes: 54 additions & 0 deletions modules/local/retroseq/call/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
process RETROSEQ_CALL {
tag "$meta.id"
label 'process_low'

conda "bioconda::perl-retroseq=1.5=pl5321hdfd78af_1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1' : 'docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1' }"


input:
tuple val(meta), path(tab), path(bam), path(bai)
tuple val(meta2), path(fasta)
tuple val(meta3), path(fai)

output:
tuple val(meta), path("*.vcf"), emit: vcf
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def VERSION = "1.5"

"""
retroseq.pl \\
-call \\
$args \\
-bam $bam \\
-input $tab \\
-ref $fasta \\
-output ${prefix}.vcf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
retroseq_call: $VERSION
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def VERSION = "1.5"
"""
touch ${prefix}.vcf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
retroseq_call: $VERSION
END_VERSIONS
"""
}
69 changes: 69 additions & 0 deletions modules/local/retroseq/call/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: "retroseq_call"
description: RetroSeq is a tool for discovery and genotyping of transposable element variants (TEVs) from next-gen sequencing reads aligned to a reference genome in BAM format.
keywords:
- retroseq
- transposable elements
- genomics
tools:
- "retroseq":
description: "RetroSeq: discovery and genotyping of TEVs from reads in BAM format."
homepage: "https://github.com/tk2/RetroSeq"
documentation: "https://github.com/tk2/RetroSeq"
tool_dev_url: "https://github.com/tk2/RetroSeq"
doi: "10.1093/bioinformatics/bts697"
licence: "['GPL']"

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- tab:
type: file
description: Output file from running retroseq -call
pattern: "*.tab"
- bam:
type: file
description: Sorted BAM file
pattern: "*.bam"
- bai:
type: file
description: Index of the sorted BAM file
pattern: "*.bam"
- meta2:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- fasta:
type: file
description: Reference genome in fasta format
pattern: "*.fasta"
- meta3:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- fai:
type: file
description: Reference FASTA index
pattern: "*.fai"

jemten marked this conversation as resolved.
Show resolved Hide resolved
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- vcf:
type: file
description: Output file containing TEVs and their location in the genome.
pattern: "*.vcf"

authors:
- "@peterpru"
Loading