You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The {mapper}.gcnv_contig_ploidy.wgs module in the sv_calling_wgs step (but probably also smimlarly for sv_calling_wes) creates subfolders for each sample that is run sv_calling_wgs/work/{mapper}.gcnv_contig_ploidy.wgs/out/{mappers}.gcnv_contig_ploidy.wgs/ploidy-calls/SAMPLE_*. However these folders are never cleaned/removed, even if the samplesheet is updated, or specifically if samples are removed.
This means that the gcnv_contig_ploidy wrapper script will fail if it reads all these SAMPLE_* folders with a glob when not all of them are defined in the samplesheet.
To Reproduce
Steps to reproduce the behavior:
Run the sv_calling_wgs step with any set of samples.
Change the samplesheet so that overall fewer samples are present than in the previous run.
Try to rerun the sv_calling_wgs / {mapper}.gcnv_contig_ploidy.wgs step
See error (sv_calling_wgs/slurm_log/{id}/snakejob.sv_calling_wgs_gcnv_contig_ploidy.{n}.sh-{id}.log) :
File "/data/cephfs-1/work/projects/medgen_genomes/2023-01-23_Limb_Study_Reboot/GRCh37/sv_calling_wgs/.snakemake/scripts/tmp9ot4x6ul.wrapper.py", line 58, in <module>
sample_sex = sex_map[sample_name]
KeyError: '{previous_sample}-N1-DNA1-WGS1'
Expected behavior
There several relatively easy optionsto fix the behaviour of the warpper script:
Read all SAMPLE_* folders but ignore all that have samples not defined in the samplesheet
Only read the first N SAMPLE_* folders (N = number of samples from samplehseet)
Delete all SAMPLE_* folders at the start of the wrapper script to ensure that it always contains on the most recent output data.
Additional context
It also seems that gatk DetermineGermlineContigPloidy will always overwrite the SAMPLE_N folders starting with N=0 up to the number of samples in any given run.
The text was updated successfully, but these errors were encountered:
Addendum: the (following) {mapper}.gcnv_call_cnvs.wgs.XXXX_of_YYYY rules will also fail if their output files already exist but were created by a different user (since gatk tries to shutils copy the file ownership). Potentially the wrapper script here should remove the whole output folder as a first step.
Describe the bug
The
{mapper}.gcnv_contig_ploidy.wgs
module in the sv_calling_wgs step (but probably also smimlarly for sv_calling_wes) creates subfolders for each sample that is runsv_calling_wgs/work/{mapper}.gcnv_contig_ploidy.wgs/out/{mappers}.gcnv_contig_ploidy.wgs/ploidy-calls/SAMPLE_*
. However these folders are never cleaned/removed, even if the samplesheet is updated, or specifically if samples are removed.This means that the gcnv_contig_ploidy wrapper script will fail if it reads all these SAMPLE_* folders with a glob when not all of them are defined in the samplesheet.
To Reproduce
Steps to reproduce the behavior:
{mapper}.gcnv_contig_ploidy.wgs
stepExpected behavior
There several relatively easy optionsto fix the behaviour of the warpper script:
Additional context
It also seems that
gatk DetermineGermlineContigPloidy
will always overwrite the SAMPLE_N folders starting with N=0 up to the number of samples in any given run.The text was updated successfully, but these errors were encountered: