-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slurm validation playbook fails - Pyxis/Enroot failing to run jobs on CentOS #784
Comments
Implementing requirements from https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md fixed. |
Do you have the requirements you needed to implement to get this working? I just started running into this as well and am going to push a fix into the pyxis role. |
In addition to the link I posted earlier, I just changed/edited some options: remove '-G 2' and '-g 2' and add/replace with '--mpi=pmi2' and '--gres=gpu:2' to the srun command. |
When we resolve this issue we should undo #902. @miketice22 , did you need to add all the kernel parameter changes or just a few of them? I'm having some trouble identifying the gaps in a vanilla CentOS 7/8 install. |
Had to add all of them. |
This has been addressed here: NVIDIA/ansible-role-enroot#12 It will be making it's way back into DeepOps shortly. |
This issue is stale because it has been open for 60 days with no activity. Please update the issue or it will be closed in 7 days. |
After installing/deploying slurm with 'ansible-playbook -l slurm-cluster playbooks/slurm-cluster.yml' The validation playbook fails.
The text was updated successfully, but these errors were encountered: