-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have same slurm.conf among nodes and controller #1182
Conversation
Share slurm.conf to have same of it - in case of configuring nodes and/or partitions. Locate slurm.conf at /sw/.slurm/slurm.conf of controller where the /sw is nfs mount point to all of nodes. Let /etc/slurm/slurm.conf be a soft link to it from nodes and controller. Signed-off-by: Seyong Um <seyong.um@hyundai.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seyong-um : This functionality looks useful to support, and I know this is a common pattern in a lot of Slurm clusters.
However, I don't think we want to enable this by default, as we have several known use cases where the Slurm controller and the compute nodes don't share an NFS filesystem.
Would you be willing to put this behind a flag variable? I.e., add a boolean variable like slurm_conf_symlink
which defaults to false
, and only set up the symlink if it's set to true
. Otherwise, use the existing logic.
The flag slurm_conf_symlink will allow modes to share slurm.conf via nfs. Signed-off-by: Seyong Um <seyong.um@hyundai.com>
I added share-slurm-conf flag as you suggested. The default value is false, it is existing logic, once it has been set to true then new logic will run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our CI tests are currently having issues, but manual testing of the cluster setup process looks good from my end.
Added a couple of inline comments with additional changes. Once those are done I can merge.
Signed-off-by: Seyong Um <seyong.um@hyundai.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! All code review comments are addressed, all CI tests are passing, and manual tests work as well. Looks good to me!
Share slurm.conf to have same of it - in case of configuring nodes and/or partitions.
Locate slurm.conf at /sw/.slurm/slurm.conf of controller where the /sw is nfs mount point to the all nodes.
Let /etc/slurm/slurm.conf be a soft link to it from nodes and controller.
By Hyundai Motor Group / AIRS Company
Signed-off-by: Seyong Um seyong.um@hyundai.com