Clarification on job process location #148
-
When SLURM jobs are launched, where do they execute? Are all users job processes that are running on a given node executing inside that nodes slurm-controller container from the statefulset? Or is there an extra layer of indirection I'm not spotting where they get spawned independently sandboxed through k0s in their own per job + per node pod? I suppose the former must be the case since you are using SLURM to manage cgroup for CPU and GPU affinity. ClusterFactory looks like it checks a lot of boxes for me. Thank you for putting this together and documenting it so well. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hello, when a SLURM job is launched on the login node, the slurm controller receives the job allocation request and allocates resources to a compute node where a SLURM daemon is running. This compute node runs the actual job. Compute nodes are bare metal servers provisioned using the xCAT containerized service. They can also be VMs that can be provisioned via Terraform (we are working on this). For VMs and bare-metal servers, we use Packer to build our images. SLURM jobs are resource constrained by cgroups, but the user running the job can still access the "host" filesystem. If you want to get the same filesystem isolation as a container, you can use the Pyxis plugin or add an OCI runtime to SLURM. Both of these solutions use unprivileged containers to run a job. If we were to run jobs on Kubernetes, we would simply use Kubernetes. I hope this answers your question. |
Beta Was this translation helpful? Give feedback.
Hello, when a SLURM job is launched on the login node, the slurm controller receives the job allocation request and allocates resources to a compute node where a SLURM daemon is running.
This compute node runs the actual job. Compute nodes are bare metal servers provisioned using the xCAT containerized service. They can also be VMs that can be provisioned via Terraform (we are working on this). For VMs and bare-metal servers, we use Packer to build our images.
SLURM jobs are resource constrained by cgroups, but the user running the job can still access the "host" filesystem. If you want to get the same filesystem isolation as a container, you can use the Pyxis plugin or add an OCI runtime…