Clarification on job process location #148

JossWhittle · 2022-11-18T00:05:32Z

JossWhittle
Nov 18, 2022

When SLURM jobs are launched, where do they execute?

Are all users job processes that are running on a given node executing inside that nodes slurm-controller container from the statefulset?

i.e. https://github.com/SquareFactory/ClusterFactory/blob/main/helm/slurm-cluster/templates/slurm-controller/statefulset.yml#L110

Or is there an extra layer of indirection I'm not spotting where they get spawned independently sandboxed through k0s in their own per job + per node pod?

I suppose the former must be the case since you are using SLURM to manage cgroup for CPU and GPU affinity.

ClusterFactory looks like it checks a lot of boxes for me. Thank you for putting this together and documenting it so well.

Answered by Darkness4

Nov 18, 2022

Hello, when a SLURM job is launched on the login node, the slurm controller receives the job allocation request and allocates resources to a compute node where a SLURM daemon is running.

This compute node runs the actual job. Compute nodes are bare metal servers provisioned using the xCAT containerized service. They can also be VMs that can be provisioned via Terraform (we are working on this). For VMs and bare-metal servers, we use Packer to build our images.

SLURM jobs are resource constrained by cgroups, but the user running the job can still access the "host" filesystem. If you want to get the same filesystem isolation as a container, you can use the Pyxis plugin or add an OCI runtime…

View full answer

Darkness4 · 2022-11-18T13:03:02Z

Darkness4
Nov 18, 2022
Maintainer

Hello, when a SLURM job is launched on the login node, the slurm controller receives the job allocation request and allocates resources to a compute node where a SLURM daemon is running.

This compute node runs the actual job. Compute nodes are bare metal servers provisioned using the xCAT containerized service. They can also be VMs that can be provisioned via Terraform (we are working on this). For VMs and bare-metal servers, we use Packer to build our images.

SLURM jobs are resource constrained by cgroups, but the user running the job can still access the "host" filesystem. If you want to get the same filesystem isolation as a container, you can use the Pyxis plugin or add an OCI runtime to SLURM. Both of these solutions use unprivileged containers to run a job.

If we were to run jobs on Kubernetes, we would simply use Kubernetes.

I hope this answers your question.

1 reply

JossWhittle Nov 18, 2022
Author

Thanks for the clarification @Darkness4 .

I think I had fundamentally misunderstood the role xCAT was playing here.

It's all making a lot more sense now.
https://docs.clusterfactory.io/docs/guides/provisioning/deploy-xcat#4c-network-configuration
https://docs.clusterfactory.io/docs/guides/provisioning/gitops-with-xcat#step-1-setup-the-git-repository

So after deploying all the management servers on k0s, which are the only hosts running as k0s nodes. Compute nodes are configured from the xCAT management server as a manual process by the cluster maintainer using an explicit list of ip addresses to the compute nodes.

And because you are doing diskless xCAT provisioning you are using a hardware BMC on each node to network boot them.

This is all making a lot more sense now. Please excuse me updating this as the penny drops and I start understanding each part of this a bit more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on job process location #148

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Clarification on job process location #148

JossWhittle Nov 18, 2022

Replies: 1 comment · 1 reply

Darkness4 Nov 18, 2022 Maintainer

JossWhittle Nov 18, 2022 Author

JossWhittle
Nov 18, 2022

Replies: 1 comment 1 reply

Darkness4
Nov 18, 2022
Maintainer

JossWhittle Nov 18, 2022
Author