Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint Germline subworkflow haplotypecaller -> Vqsr #595

Merged
merged 90 commits into from
Jul 21, 2022
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
19e5e0e
add vqsr from Atholl, very early WIP
May 13, 2022
cd8d767
added meta.yml for vqsr subworkflow
GCJMackenzie Jun 1, 2022
03909c9
Merge pull request #9 from GCJMackenzie/add_vqsr_subworkflow
nickhsmith Jun 9, 2022
c22a64f
joint variant calling updates to gatk best practices
Jun 17, 2022
028ad1a
group by interval and exclude sample info
Jun 17, 2022
57c2dff
interval and no_interval grouping
Jun 20, 2022
ea8dd93
add params and vqsr process
Jun 22, 2022
d6cc403
print statements
Jun 23, 2022
b78ad1c
joint variantcalling
Jun 27, 2022
37aa902
update
Jun 28, 2022
d49a5ac
add interval_names to meta
Jun 29, 2022
d574b0c
Merge remote-tracking branch 'NF-core/dev' into vqsr
Jun 29, 2022
dd7445f
prepare vqsr
Jul 4, 2022
06c66ad
remove inclusion of local config
Jul 4, 2022
8686dfb
lint
Jul 4, 2022
f9fcb65
force bcftools sort
Jul 6, 2022
6e5e8dd
fix typo and clearer naming
Jul 7, 2022
9a29759
include variant_recalibration param
Jul 7, 2022
aa110ab
apply vqsr and merge
Jul 8, 2022
fa41fa0
snp and indels must be present for recalibration
Jul 8, 2022
17924d0
fix typo
Jul 11, 2022
50de0de
typo
Jul 11, 2022
7947ce0
improve resource channels
Jul 12, 2022
3463c56
pre-merge
Jul 12, 2022
1c48729
merge with nf-core dev
Jul 12, 2022
b835c37
improve haplotypecaller merging
Jul 12, 2022
d45423f
fix naming
Jul 12, 2022
ed5ccea
update known_sites
Jul 13, 2022
3de8f7f
Merge remote-tracking branch 'NF-core/dev' into vqsr
Jul 13, 2022
b2fd565
update
Jul 14, 2022
bfaf48d
fix typo
Jul 15, 2022
cb25481
publishDir
Jul 15, 2022
8fd7ff9
flatten known_sites
Jul 18, 2022
4eac972
merge with dev
Jul 18, 2022
ced2398
add joint_germline
Jul 18, 2022
66a7f00
update modules
Jul 18, 2022
0ea9747
fix no_intervals and tests
Jul 18, 2022
0717286
lint
Jul 18, 2022
37e5e77
update to dev
Jul 18, 2022
1db8bc8
lint
Jul 18, 2022
ff6b530
fix empty module
Jul 18, 2022
aa7127a
correct prepare_genome input
Jul 18, 2022
5be916a
merge with dev
Jul 19, 2022
21182a3
change name to match tests
nickhsmith Jul 19, 2022
9eebd71
update test
nickhsmith Jul 19, 2022
69feebd
Update workflows/sarek.nf
nickhsmith Jul 19, 2022
20a7e2e
Update workflows/sarek.nf
nickhsmith Jul 19, 2022
3cbe260
Apply suggestions from code review
nickhsmith Jul 19, 2022
e9fb076
Update conf/modules.config
nickhsmith Jul 19, 2022
6ea35cc
Update subworkflows/nf-core/variantcalling/haplotypecaller/main.nf
nickhsmith Jul 19, 2022
8d7235d
Update conf/modules.config
nickhsmith Jul 19, 2022
13f9bd7
Update nextflow_schema.json
nickhsmith Jul 19, 2022
e75a7f3
remove unneeded when statements
Jul 19, 2022
fb7c594
update paths
nickhsmith Jul 19, 2022
04b3191
Merge branch 'vqsr' of github.com:nickhsmith/sarek into vqsr
nickhsmith Jul 19, 2022
9e31397
fix warnings
nickhsmith Jul 19, 2022
fc0c251
fix warnings
nickhsmith Jul 19, 2022
2e9a724
update path
nickhsmith Jul 19, 2022
748947d
remove lane from csv
nickhsmith Jul 19, 2022
7e3a11e
add step_variant calling
nickhsmith Jul 19, 2022
77eb3d6
Update conf/modules.config
nickhsmith Jul 19, 2022
3fe7927
Update conf/modules.config
nickhsmith Jul 19, 2022
acb9fce
Update modules.config
nickhsmith Jul 19, 2022
c793d5f
Merge remote-tracking branch 'NF-core/dev' into vqsr
Jul 19, 2022
87f3405
Apply suggestions from code review
nickhsmith Jul 20, 2022
ef8f526
Apply suggestions from code review
nickhsmith Jul 20, 2022
72b4bc8
hg19 sample names
Jul 20, 2022
e5b1703
change tool inputs
Jul 20, 2022
d6c006e
Merge branch 'vqsr' of github.com:nickhsmith/sarek into vqsr
Jul 20, 2022
c94e00b
Update subworkflows/nf-core/variantcalling/haplotypecaller/main.nf
nickhsmith Jul 20, 2022
0dd0976
Update after code review
Jul 20, 2022
717e271
groupTuple
Jul 20, 2022
da17db9
merge with dev
Jul 20, 2022
4b9bd9f
improve meta and tupleGrouping
Jul 20, 2022
9bae10f
Update conf/igenomes.config
nickhsmith Jul 20, 2022
b77a116
Update conf/modules.config
nickhsmith Jul 20, 2022
468dedb
Update nextflow_schema.json
nickhsmith Jul 20, 2022
e919d09
Update subworkflows/local/germline_variant_calling.nf
nickhsmith Jul 20, 2022
46bc423
Update subworkflows/local/germline_variant_calling.nf
nickhsmith Jul 20, 2022
f057709
Update germline_variant_calling.nf
nickhsmith Jul 20, 2022
4d50caf
Update subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf
nickhsmith Jul 20, 2022
c707f7c
meta format
Jul 20, 2022
ac0a1f5
undo publishDir change
Jul 20, 2022
a6d6e9a
hardcode joint_variant_calling publish path
Jul 20, 2022
7f5a916
fix typo
Jul 20, 2022
1fc9e2a
Merge remote-tracking branch 'NF-core/dev' into vqsr
Jul 20, 2022
d842fe0
Merge branch 'dev' into vqsr
nickhsmith Jul 20, 2022
73e84db
fix haplotypecaller cram input
nickhsmith Jul 20, 2022
9fa1a5c
fix indents, commas etc
FriederikeHanssen Jul 21, 2022
b0c32c7
remove test file
nickhsmith Jul 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ params {
chr_dir = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/Chromosomes"
dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf"
dbsnp_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx"
dbsnp_vqsr = 'dbsnp,known=false,training=true,truth=false,prior=2 dbsnp_138.b37.vcf'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer having args in the modules.config, and avoiding adding extra files in igenomes.config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't fit with the nf-core/module styling as this is expected to be an inputted value

dict = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.dict"
fasta = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta"
fasta_fai = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.fai"
Expand All @@ -26,12 +27,30 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/intervals/wgs_calling_regions_Sarek.list"
known_indels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
known_indels_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
known_indels_mills_vqsr = 'mills,known=false,training=true,truth=true,prior=12 Mills_and_1000G_gold_standard.indels.b37.vcf'
known_indels_1000g_vqsr = '1000G,known=false,training=true,truth=true,prior=10 1000G_phase1.indels.b37.vcf'
mappability = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/Control-FREEC/out100m2_hg19.gem"
res_1000g = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/1000G_phase1.snps.high_confidence.b37.vcf.gz"
res_1000g_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/1000G_phase1.snps.high_confidence.b37.vcf.idx.gz"
res_1000g_vqsr = '1000G,known=false,training=true,truth=true,prior=10 1000G_phase1.snps.high_confidence.b37.vcf.gz'
snpeff_db = 'GRCh37.75'
snpeff_genome = 'GRCh37'
vep_cache_version = 104
vep_genome = 'GRCh37'
vep_species = 'homo_sapiens'

// resources for GATK joint germline variant recalibration
RESOURCE_SNP = [
[ res_1000g, dbsnp ],
[ res_1000g, dbsnp_tbi ],
[ res_1000g_vqsr, dbsnp_vqsr ]
]
resource_INDEL = [
[ known_indels, dbsnp ],
[ known_indels_tbi, dbsnp_tbi ],
[ known_indels_mills_vqsr, known_indels_1000g_vqsr, dbsnp_vqsr ]
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, but I feel like it should be done in the sarek script or in the joint germline variant calling workflow instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then have to use less descriptive names as for hg19 and hg38 the files are slightly different. So the naming convention has to match regardless of the genome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about something like known_snps (dbsnp should stay separate because tools like haplotypecaller explicetly want that file)


}
'GATK.GRCh38' {
ac_loci = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci"
Expand All @@ -42,6 +61,7 @@ params {
chr_dir = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/Chromosomes"
dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz"
dbsnp_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz.tbi"
dbsnp_vqsr = 'dbsnp,known=false,training=true,truth=false,prior=2 dbsnp_146.hg38.vcf.gz'
dict = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.dict"
fasta = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta"
fasta_fai = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta.fai"
Expand All @@ -50,14 +70,31 @@ params {
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/intervals/wgs_calling_regions.hg38.bed"
known_indels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
known_indels_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
known_indels_mills_vqsr = 'mills,known=false,training=true,truth=true,prior=12 Mills_and_1000G_gold_standard.indels.hg38.vcf.gz'
known_indels_gatk_vqsr = 'gatk,known=false,training=true,truth=true,prior=10 Homo_sapiens_assembly38.known_indels.vcf.gz'
mappability = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/Control-FREEC/out100m2_hg38.gem"
pon = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz"
pon_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz.tbi"
res_1000g_omni = "${params.genomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000G_omni2.5.hg38.vcf.gz"
res_1000g_omni_tbi = "${params.genomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000G_omni2.5.hg38.vcf.gz.tbi"
res_1000g_omni_vqsr = 'omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg38.vcf.gz'
snpeff_db = 'GRCh38.99'
snpeff_genome = 'GRCh38'
vep_cache_version = 104
vep_genome = 'GRCh38'
vep_species = 'homo_sapiens'

nickhsmith marked this conversation as resolved.
Show resolved Hide resolved
// resources for GATK joint germline variant recalibration
resource_SNP = [
[ res_1000g_omni, dbsnp ],
[ res_1000g_omni, dbsnp_tbi ],
[ res_1000g_omni_vqsr, dbsnp_vqsr ]
]
resource_INDEL = [
[ known_indels, dbsnp ],
[ known_indels_tbi, dbsnp_tbi ],
[ known_indels_mills_vqsr, known_indels_gatk_vqsr, dbsnp_vqsr ]
]
}
'Ensembl.GRCh37' {
bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/"
Expand Down
23 changes: 22 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -570,15 +570,36 @@ process{
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
withName: 'GENOTYPEGVCFS' {
withName: 'GATK4_GENOMICSDBIMPORT' {
ext.prefix = { "$meta.id" }
ext.when = { params.tools && params.tools.contains('haplotypecaller') && params.joint_germline}
publishDir = [
enabled: params.generate_gvcf,
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/${meta.id}/haplotypecaller"},
pattern: { "${meta.id}/" }
]
}
withName: 'GATK4_GENOTYPEGVCFS' {
ext.when = { params.tools && params.tools.contains('haplotypecaller') && params.joint_germline}
nickhsmith marked this conversation as resolved.
Show resolved Hide resolved
ext.prefix = { "${meta.id}" }
publishDir = [
enabled: params.no_intervals,
FriederikeHanssen marked this conversation as resolved.
Show resolved Hide resolved
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/${meta.id}/haplotypecaller"},
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
withName: 'MERGE_GENOTYPEGVCFS' {
ext.when = { meta.num_intervals > 1}
nickhsmith marked this conversation as resolved.
Show resolved Hide resolved
ext.prefix = "joint_germline"
publishDir = [
enabled: !params.no_intervals,
nickhsmith marked this conversation as resolved.
Show resolved Hide resolved
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/${meta.id}/haplotypecaller" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

// MANTA
withName: 'MERGE_MANTA.*' {
Expand Down
5 changes: 3 additions & 2 deletions modules/nf-core/modules/gatk4/genotypegvcfs/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions modules/nf-core/modules/gatk4/variantrecalibrator/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ profiles {
// Load igenomes.config if required
if (!params.igenomes_ignore) {
includeConfig 'conf/igenomes.config'
includeConfig '/data/cephrbg/project/ghga/benchmark/smith_work/other/test_full/local_genomes.config'
} else {
params.genomes = [:]
}
Expand Down
129 changes: 55 additions & 74 deletions subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading