Slurm is an open-source workload manager designed for Linux clusters of all sizes. It’s a great system for queuing jobs for your HPC applications. I’m going to show you how to install Slurm on a CentOS 7 cluster.
- Delete failed installation of Slurm
- Create the global users
- Install Munge
- Install Slurm
- Use Slurm
I configured our nodes with the following hostnames using these steps. Our server is:
buhpc3
The clients are:
buhpc1 buhpc2 buhpc3 buhpc4 buhpc5 buhpc6
I leave this optional step in case you tried to install Slurm, and it didn’t work. We want to uninstall the parts related to Slurm unless you’re using the dependencies for something else.
First, I remove the database where I kept Slurm’s accounting.
yum remove mariadb-server mariadb-devel -y
Next, I remove Slurm and Munge. Munge is an authentication tool used to identify messaging from the Slurm machines.
yum remove slurm munge munge-libs munge-devel -y
I check if the slurm and munge users exist.
cat /etc/passwd | grep slurm
Then, I delete the users and corresponding folders.
userdel - r slurm
userdel -r munge userdel: user munge is currently used by process 26278
kill 26278
userdel -r munge
Slurm, Munge, and Mariadb should be adequately wiped. Now, we can start a fresh installation that actually works.
Slurm and Munge require consistent UID and GID across every node in the cluster.
If your cluster has been configured, just add some new nodes, you should copy the /etc/munge/munge.key from your configured nodes to all your new nodes.
scp /etc/munge/munge.key buhpc02:/etc/munge/munge.key
If you create a new cluster, For all the nodes, before you install Slurm or Munge:
export MUNGEUSER=1001 groupadd -g $MUNGEUSER munge useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge export SLURMUSER=1002 groupadd -g $SLURMUSER slurm useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm
Since I’m using CentOS 7, I need to get the latest EPEL repository.
yum install epel-release -y
Now, I can install Munge.
yum install munge munge-libs munge-devel -y
After installing Munge, I need to create a secret key on the Server. My server is on the node with hostname, buhpc3. Choose one of your nodes to be the server node.
First, we install rng-tools to properly create the key.
yum install rng-tools -y rngd -r /dev/urandom
Now, we create the secret key. You only have to do the creation of the secret key on the server.
/usr/sbin/create-munge-key -r
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key
After the secret key is created, you will need to send this key to all of the compute nodes.
scp /etc/munge/munge.key root@1.buhpc.com:/etc/munge scp /etc/munge/munge.key root@2.buhpc.com:/etc/munge scp /etc/munge/munge.key root@4.buhpc.com:/etc/munge scp /etc/munge/munge.key root@5.buhpc.com:/etc/munge scp /etc/munge/munge.key root@6.buhpc.com:/etc/munge
Now, we SSH into every node and correct the permissions as well as start the Munge service.
chown -R munge: /etc/munge/ /var/log/munge/ chmod 0700 /etc/munge/ /var/log/munge/
systemctl enable munge systemctl start munge
To test Munge, we can try to access another node with Munge from our server node, buhpc3.
munge -n munge -n | unmunge munge -n | ssh 3.buhpc.com unmunge remunge
If you encounter no errors, then Munge is working as expected.
Slurm has a few dependencies that we need to install before proceeding.
yum install openssl openssl-devel pam-devel mariadb-server mariadb-devel numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad perl-Switch http-parser-devel json-c-devel lua-json -y
Now, we download the latest version of Slurm preferably in our shared folder. The latest version of Slurm may be different from our version.
download the latest stable version of slurm by click: slurm-17.11.2.TAR.BZ2
cd /nfs
If you don’t have rpmbuild yet:
yum install rpm-build python3 cpanm* gcc gcc-c++ -y rpmbuild -ta -with lua -with hwloc slurm-17.11.2.tar.bz2
We will check the rpms created by rpmbuild.
cd /root/rpmbuild/RPMS/x86_64
Now, we will move the Slurm rpms for installation for the server and computer nodes.
mkdir /nfs/slurm-rpms cp /root/rpmbuild/BUILD/slurm-17.11.2/src/plugins/auth/munge/.libs/auth_munge.so /usr/lib64/slurm/ cp /root/rpmbuild/BUILD/slurm-17.11.2/src/plugins/cred/munge/.libs/cred_munge.so /usr/lib64/slurm/
On every node that you want to be a server and compute node, we install those rpms. In our case, I want every node to be a compute node.
yum --nogpgcheck localinstall slurm-17.11.2-1.el7.centos.x86_64.rpm slurm-devel-17.11.2-1.el7.centos.x86_64.rpm slurm-munge-17.11.2-1.el7.centos.x86_64.rpm slurm-perlapi-17.11.2-1.el7.centos.x86_64.rpm slurm-plugins-17.11.2-1.el7.centos.x86_64.rpm slurm-sjobexit-17.11.2-1.el7.centos.x86_64.rpm slurm-sjstat-17.11.2-1.el7.centos.x86_64.rpm slurm-torque-17.11.2-1.el7.centos.x86_64.rpm
After we have installed Slurm on every machine, we will configure Slurm properly.
I leave everything default except:
ControlMachine: buhpc3 ControlAddr: 128.197.116.18 NodeName: buhpc[1-6] CPUs: 4 StateSaveLocation: /var/spool/slurmctld SlurmctldLogFile: /var/log/slurm/slurmctld.log SlurmdLogFile: /var/log/slurm/slurmd.log ClusterName: buhpc
After you hit Submit on the form, you will be given the full Slurm configuration file to copy.
On the server node, which is buhpc3:
cd /etc/slurm vim slurm.conf
Copy the form’s Slurm configuration file that was created from the website and paste it into slurm.conf. We still need to change something in that file.
Underneathe slurm.conf “# COMPUTE NODES,” we see that Slurm tries to determine the IP addresses automatically with the one line.
NodeName=buhpc[1-6] CPUs = 4 State = UNKOWN
I don’t use IP addresses in order, so I manually delete this one line and change it to:
NodeName=buhpc1 NodeAddr=128.197.115.158 CPUs=4 State=UNKNOWN NodeName=buhpc2 NodeAddr=128.197.115.7 CPUs=4 State=UNKNOWN NodeName=buhpc3 NodeAddr=128.197.115.176 CPUs=4 State=UNKNOWN NodeName=buhpc4 NodeAddr=128.197.115.17 CPUs=4 State=UNKNOWN NodeName=buhpc5 NodeAddr=128.197.115.9 CPUs=4 State=UNKNOWN NodeName=buhpc6 NodeAddr=128.197.115.15 CPUs=4 State=UNKNOWN
After you explicitly put in the NodeAddr IP Addresses, you can save and quit. Here is my full slurm.conf and what it looks like:
# slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=buhpc3 ControlAddr=128.197.115.176 # #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none TaskPlugin=task/none # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill #SchedulerPort=7321 SelectType=select/linear # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=buhpc #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm/slurmctld.log #SlurmdDebug=3 SlurmdLogFile=/var/log/slurm/slurmd.log # # # COMPUTE NODES NodeName=buhpc1 NodeAddr=128.197.115.158 CPUs=4 State=UNKNOWN NodeName=buhpc2 NodeAddr=128.197.115.7 CPUs=4 State=UNKNOWN NodeName=buhpc3 NodeAddr=128.197.115.176 CPUs=4 State=UNKNOWN NodeName=buhpc4 NodeAddr=128.197.115.17 CPUs=4 State=UNKNOWN NodeName=buhpc5 NodeAddr=128.197.115.9 CPUs=4 State=UNKNOWN NodeName=buhpc6 NodeAddr=128.197.115.15 CPUs=4 State=UNKNOWN PartitionName=debug Nodes=buhpc[1-6] Default=YES MaxTime=INFINITE State=UP
Now that the server node has the slurm.conf correctly, we need to send this file to the other compute nodes.
scp slurm.conf root@1.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@2.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@4.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@5.buhpc.com/etc/slurm/slurm.conf scp slurm.conf root@6.buhpc.com/etc/slurm/slurm.conf
Or, you can do this in the manager node to send your file to all nodes in the cluster.
xdcp all /etc/slurm/slurm.conf /etc/slurm/slurm.conf
Now, we will configure the server node, buhpc3. We need to make sure that the server has all the right configurations and files.
mkdir /var/spool/slurmctld chown slurm: /var/spool/slurmctld chmod 755 /var/spool/slurmctld mkdir /var/log/slurm touch /var/log/slurm/slurmctld.log touch /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log chown -R slurm:slurm /var/log/slurm
Now, we will configure all the compute nodes, buhpc[1-6]. We need to make sure that all the compute nodes have the right configurations and files.
mkdir /var/spool/slurmd chown slurm: /var/spool/slurmd chmod 755 /var/spool/slurmd mkdir /var/log/slurm touch /var/log/slurm/slurmd.log chown -R slurm:slurm /var/log/slurm
Use the following command to make sure that slurmd is configured properly.
slurmd -C
You should get something like this:
ClusterName=(null) NodeName=buhpc3 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=7822 TmpDisk=45753 UpTime=13-14:27:52
The firewall will block connections between nodes, so I normally disable the firewall on the compute nodes except for buhpc3.
systemctl stop firewalld systemctl disable firewalld
On the server node, buhpc3, I usually open the default ports that Slurm uses:
firewall-cmd --permanent --zone=public --add-port=6817/udp firewall-cmd --permanent --zone=public --add-port=6817/tcp firewall-cmd --permanent --zone=public --add-port=6817/udp firewall-cmd --permanent --zone=public --add-port=6818/tcp firewall-cmd --permanent --zone=public --add-port=6818/udp firewall-cmd --permanent --zone=public --add-port=7321/tcp firewall-cmd --permanent --zone=public --add-port=7321/udp firewall-cmd --reload
If the port freeing does not work, stop the firewalld for testing. Next, we need to check for out of sync clocks on the cluster. On every node:
yum install ntp -y chkconfig ntpd on ntpdate pool.ntp.org systemctl start ntpd
create cluster with command:
sacctmgr create cluster clustername
The clocks should be synced, so we can try starting Slurm! On all the compute nodes, buhpc[1-6]:
systemctl enable slurmd.service systemctl start slurmd.service systemctl status slurmd.service
Now, on the server node, buhpc3:
systemctl enable slurmctld.service systemctl start slurmctld.service systemctl status slurmctld.service
When you check the status of slurmd and slurmctld, we should see if they successfully completed or not. If problems happen, check the logs!
Compute node bugs: tail /var/log/slurm/slurmd.log Server node bugs: tail /var/log/slurm/slurmctld.log
To display the compute nodes:
scontrol show nodes
-N allows you to choose how many compute nodes that you want to use. To run jobs on the server node, buhpc3:
srun -N5 /bin/hostname
buhpc3 buhpc2 buhpc4 buhpc5 buhpc1
To display the job queue:
scontrol show jobs
JobId=16 JobName=hostname UserId=root(0) GroupId=root(0) Priority=4294901746 Nice=0 Account=(null) QOS=(null) JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2016-04-10T16:26:04 EligibleTime=2016-04-10T16:26:04 StartTime=2016-04-10T16:26:04 EndTime=2016-04-10T16:26:04 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=debug AllocNode:Sid=buhpc3:1834 ReqNodeList=(null) ExcNodeList=(null) NodeList=buhpc[1-5] BatchHost=buhpc1 NumNodes=5 NumCPUs=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=20,node=5 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/bin/hostname WorkDir=/root Power= SICP=0
To submit script jobs, create a script file that contains the commands that you want to run. Then:
sbatch -N2 script-file
Slurm has a lot of useful commands. You may have heard of other queuing tools like torque. Here’s a useful link for the command differences: http://www.sdsc.edu/~hocks/FG/PBS.slurm.html