You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NFCORE_SAREK:SAREK:GERMLINE_VARIANT_CALLING:RUN_HAPLOTYPECALLER:JOINT_GERMLINE:GATK4_GENOMICSDBIMPORT processes consistently run slowly on my cluster. My initial test runs (joint germline calling with five WGS samples) crashed after hitting walltime on these processes, after the standard retry with doubled resources. Quadrupling the default walltime got them through, but even then the occasional job timed out and needed a retry.
This resulted in a ~60-fold speed-up of the GATK_GENOMICSDBIMPORT jobs, with the mean duration reduced from 3h21m to 3m08s.
I'm not sure whether these options should be on by default, or whether that might cause issues on other systems.
I'm running a locally-installed Sarek 3.0, with PBS Pro as scheduler. The cluster is a mix of nodes with 28 Intel CPUs and 128GB RAM, and nodes with 128 AMD CPUs and 1TB RAM, all connected to scratch storage by 100Ggbs ethernet or infiniband.
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered:
Description of the bug
NFCORE_SAREK:SAREK:GERMLINE_VARIANT_CALLING:RUN_HAPLOTYPECALLER:JOINT_GERMLINE:GATK4_GENOMICSDBIMPORT processes consistently run slowly on my cluster. My initial test runs (joint germline calling with five WGS samples) crashed after hitting walltime on these processes, after the standard retry with doubled resources. Quadrupling the default walltime got them through, but even then the occasional job timed out and needed a retry.
I found some relevant info here: https://gatk.broadinstitute.org/hc/en-us/articles/360056138571-GenomicsDBImport-usage-and-performance-guidelines
I edited my local copy of
modules/nf-core/modules/gatk4/genomicsdbimport/main.nf
to add the following options in the gatk command:This resulted in a ~60-fold speed-up of the GATK_GENOMICSDBIMPORT jobs, with the mean duration reduced from 3h21m to 3m08s.
I'm not sure whether these options should be on by default, or whether that might cause issues on other systems.
I'm running a locally-installed Sarek 3.0, with PBS Pro as scheduler. The cluster is a mix of nodes with 28 Intel CPUs and 128GB RAM, and nodes with 128 AMD CPUs and 1TB RAM, all connected to scratch storage by 100Ggbs ethernet or infiniband.
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: