-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suddenly stopped working with slurm #8
Comments
It is very difficult to test what is going on here. However, judging from the error message, it looks like the computing node does not have this module. Can you find the job id of the failed job, get the node on which the task is executed, ssh to the node, and check module load runs over there? |
Yes, when I ssh into the node, lmod loads miniconda without issue. I've tried other sos scripts that previously worked and use common modules like GATK, trimmomatic, BWA, etc. and in every case the job fails with lmod not finding the module.
|
ok, on the computing node can you run the script directly?
The module load command is supposed to be executed from here. I need to run an errand now but the next thing you can try is to print the env inside the task script. Maybe slurm needs an option to pass all env into the script (which for PBS is Just my guess. |
It seems to be running fine when started this way. When I use another terminal to log onto the cluster I see
And I see output files starting to appear where they should. |
So this means some env was not passed to the script correctly. The next step would be creating a shell script with sos execute, use sbatch to submit the job. If the job fails, maybe add an option --export=ALL. --export=ALLDefault mode if --export is not specified. All of the user's environment will be loaded (either from the caller's environment or from a clean environment if --get-user-env is specified). |
That works too!?
And after |
In case this helps
|
I added |
The notebook simply runs sbatch to submit jobs so you will need to check if you can manually submit jobs with that module load command. |
To try and further debug this I saw that the original error message said So I added Best, |
I've been using sos a lot in the past week on my university cluster but today it has suddenly stopped working with a strange environment issue. None of my steps which worked previously work anymore. For an example, here's my step to run gubbins on a multi-fasta alignment.
But it fails with this error
The important bit is
Lmod has detected the following error: The following module(s) are unknown: "miniconda"
. But when I submit the job manually (usingsbatch
) the following script runs fineThis way
lmod
finds the miniconda module just fine and runs. My sos step was working on the cluster a couple of days ago. Now all of my sos scripts are failing withlmod
not finding any modules, so it doesn't seem to be a module specific problem, butlmod
itself. What could cause this?The text was updated successfully, but these errors were encountered: