You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the atac-seq pipeline on our SLURM cluster, it keeps failing at seemingly arbitrary points, with an error message saying that the process was "terminated for an unknown reason -- Likely it has been terminated by the external system" (see full error below).
When resuming the pipeline, without any changes in parameters or anything, it usually does get past the previously terminated process and then fails again at a later step, with the same error message. If I keep resuming the pipeline, eventually it does reach the end.
When a process fails, the working directory contains only two files:
.command.sh
.command.run
No .out, .trace, .exitcode, ... and also no symlinks to the input data have been created. If a manually submit the .command.run script to the cluster, without making any changes, it succeeds without any problem and all the files are there.
I have been in touch with our IT support in charge of managing the cluster but they also have no clue what is happening. We used to have a Sun Grid Engine cluster, on which the pipeline ran without problems. The issue started to appear when the cluster was migrated to SLURM.
Command used and terminal output
#!/bin/bash
#
#SBATCH -p all # partition (queue)#SBATCH -c 1 # number of cores#SBATCH --mem 16G # memory pool for all cores#SBATCH -o slurm.%N.%j.out # STDOUT#SBATCH -e slurm.%N.%j.err # STDERRmodule load java/x86_64/16.0.1+9module load nextflow/x86_64/23.04.1nextflow -c atac-seq-slurm.config run nf-core/atacseq \ -profile singularity \ -params-file atac-seq.yaml \ --save_align_intermeds \ -resumeERROR ~ Error executing process > 'NFCORE_ATACSEQ:ATACSEQ:MERGED_LIBRARY_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX (CONTROL_REP1)'Caused by: Process `NFCORE_ATACSEQ:ATACSEQ:MERGED_LIBRARY_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX (CONTROL_REP1)` terminated for an unknown reason -- Likely it has been terminated by the external systemCommand executed: samtools \ index \ -@ 1 \ \ CONTROL_REP1.mLb.mkD.sorted.bam cat <<-END_VERSIONS > versions.yml "NFCORE_ATACSEQ:ATACSEQ:MERGED_LIBRARY_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX": samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//') END_VERSIONSCommand exit status: -Command output: (empty)
Description of the bug
When running the atac-seq pipeline on our SLURM cluster, it keeps failing at seemingly arbitrary points, with an error message saying that the process was "terminated for an unknown reason -- Likely it has been terminated by the external system" (see full error below).
When resuming the pipeline, without any changes in parameters or anything, it usually does get past the previously terminated process and then fails again at a later step, with the same error message. If I keep resuming the pipeline, eventually it does reach the end.
When a process fails, the working directory contains only two files:
.command.sh
.command.run
No
.out
,.trace
,.exitcode
, ... and also no symlinks to the input data have been created. If a manually submit the.command.run
script to the cluster, without making any changes, it succeeds without any problem and all the files are there.I have been in touch with our IT support in charge of managing the cluster but they also have no clue what is happening. We used to have a Sun Grid Engine cluster, on which the pipeline ran without problems. The issue started to appear when the cluster was migrated to SLURM.
Command used and terminal output
Relevant files
The config file only sets the working directory and the SLURM executor:
The parameter file contains these settings:
System information
The text was updated successfully, but these errors were encountered: