-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joint Germline subworkflow haplotypecaller -> Vqsr #595
Conversation
Duplicate of #546, but more advanced, so taking over |
conf/igenomes.config
Outdated
@@ -18,6 +18,7 @@ params { | |||
chr_dir = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/Chromosomes" | |||
dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf" | |||
dbsnp_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx" | |||
dbsnp_vqsr = 'dbsnp,known=false,training=true,truth=false,prior=2 dbsnp_138.b37.vcf' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer having args in the modules.config, and avoiding adding extra files in igenomes.config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't fit with the nf-core/module styling as this is expected to be an inputted value
conf/igenomes.config
Outdated
// resources for GATK joint germline variant recalibration | ||
RESOURCE_SNP = [ | ||
[ res_1000g, dbsnp ], | ||
[ res_1000g, dbsnp_tbi ], | ||
[ res_1000g_vqsr, dbsnp_vqsr ] | ||
] | ||
resource_INDEL = [ | ||
[ known_indels, dbsnp ], | ||
[ known_indels_tbi, dbsnp_tbi ], | ||
[ known_indels_mills_vqsr, known_indels_1000g_vqsr, dbsnp_vqsr ] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, but I feel like it should be done in the sarek script or in the joint germline variant calling workflow instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would then have to use less descriptive names as for hg19 and hg38 the files are slightly different. So the naming convention has to match regardless of the genome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about something like known_snps
(dbsnp should stay separate because tools like haplotypecaller explicetly want that file)
@@ -34,11 +34,10 @@ workflow RUN_HAPLOTYPECALLER { | |||
// group by interval | |||
genotype_gvcf_to_call = HAPLOTYPECALLER.out.vcf.join(HAPLOTYPECALLER.out.tbi).map{ | |||
meta, gvcf, tbi -> | |||
interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : meta.id | |||
new_meta = [id: "joint_germline", interval_name: interval_name, num_intervals: meta.num_intervals] | |||
interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : file(params.intervals).simpleName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be very careful here. I am afraid this may lead to the weird resume errors/unmatching meta mpas we had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, would it be smarter to keep meta the same, and group py another (temporary) key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is more the reqirval of the file name. I still can't explain this, but sometimes, when retrieving something from the file name like here, the name is incorrectly resolved later on (even though the matched file in the channel element is the correct one). In this case here this would lead to some very wrong results. (in the rest of sarek, since we group on patient ID it only lead to file name clashes, a) easy to find the bug, b) the actual output results were not impacted)
conf/modules.config
Outdated
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline} | ||
withName: 'GATK4_GENOMICSDBIMPORT' { | ||
ext.prefix = { meta.num_intervals > 1 ? meta.intervals_name : "joint_interval" } | ||
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals} | |
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals} |
@@ -118,17 +119,23 @@ workflow GERMLINE_VARIANT_CALLING { | |||
if (tools.split(',').contains('haplotypecaller')){ | |||
cram_recalibrated_intervals_haplotypecaller = cram_recalibrated_intervals | |||
.map{ meta, cram, crai, intervals -> | |||
[meta, cram, crai, intervals, []] | |||
|
|||
intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we also need a conditional for joint germline. This addition of meta can only happen for haplotypecaller + joint germline or we need to rewrite a bunch of logic for the single sample case
@@ -47,7 +48,7 @@ workflow GERMLINE_VARIANT_CALLING { | |||
.map{ meta, cram, crai, intervals, num_intervals -> | |||
|
|||
//If no interval file provided (0) then add empty list | |||
intervals_new = num_intervals == 0 ? [] : intervals | |||
intervals_new = num_intervals == 0 ? [] : intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intervals_new = num_intervals == 0 ? [] : intervals | |
intervals_new = num_intervals == 0 ? [] : intervals |
//Merge scatter/gather vcfs & index | ||
//Rework meta for variantscalled.csv and annotation tools | ||
MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf -> | ||
[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do the same nice formatting here as you did in line 116 ff?
[meta, cram, crai, intervals, []] | ||
|
||
intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName | ||
new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name] | |
new_meta = [ | |
data_type:meta.data_type, | |
id:meta.sample, | |
intervals_name:intervals_name, | |
num_intervals:meta.num_intervals, | |
patient:meta.patient, | |
sample:meta.sample, | |
sex:meta.sex, | |
status:meta.status | |
] |
subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf
Outdated
Show resolved
Hide resolved
|
||
versions = ch_versions // channel: [ versions.yml ] | ||
versions = ch_versions // channel: [ versions.yml ] | ||
genotype_vcf = Channel.empty().mix(vcfs_sorted_input.no_intervals, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the vcfs_sorted_input.no_intervals also needs the variantcaller: "haplotypecaller" in it smeta map here to make sure annotation is placing it in the proper folder
//Merge scatter/gather vcfs & index | ||
//Rework meta for variantscalled.csv and annotation tools | ||
MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf -> | ||
[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf] | |
[[ | |
id: "joint_variant_calling", | |
num_intervals: meta.num_intervals, | |
patient: "all_samples", | |
variantcaller: "haplotypecaller" | |
], vcf] |
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).