1. Pre-requisites:
1. Add Host entry in /etc/hosts
192.168.1.6 slurmmaster.unixadmin.in slurmmaster
192.168.1.7 cnode01.unixadmin.in cnode01
192.168.1.8 cnode02.unixdmin.in cnode02
2. Disable Selinux in /etc/selinux/config
selinux=disabled
3. Stop Firewall Service:
systemctl stop firewalld.service
systemctl disable firewalld.service
4. Start NTP Service:
systemctl start chronyd
systemctl enable chronyd
5. SSH auto login from masternode to all compute nodes
2. Installation steps
- MUNGE INSTALLATION:
Munge is used as an authentication service which allows a process to authenticate within a group of nodes having common user groups using UID and GID. it is secured by an authentication key shared among all the nodes.
Prerequisites for munge installation :
- Create user and group for Munge and Slurm on all the nodes :
- Create munge user with id-1001
export MUNGEUSER=1001
- Create a munge group and add a user munge to it.
groupadd -g $MUNGEUSER munge
useradd -m -c “MUNGE Uid ‘N’ Gid Emporium” -d /var/lib/munge -u $MUNGEUSER g munge -s /sbin/nologin munge
- Creating slurm user with id-1002
export SLURMUSER=1002
- Create a munge group and add the slurm user to it.
groupadd -g $SLURMUSER slurm
useradd -m -c “SLURM workload manager” -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm.
Munge installation steps :
- On Master node:
- Install EPEL in order to install Munge
yum install epel-release.noarch
Then , install Munge
yum install munge munge-libs munge-devel
- Create an munge authentication key
/usr/sbin/create-munge-key
- Copy the Munge authentication key to /home directory
cp /etc/munge/munge.key /home
- Then, share the copied key to all compute nodes (cnode01, cnode02)
scp /home/munge.key root@cnode01:/etc/munge
scp /home/munge.key root@cnode02:/etc/munge
scp /home/munge.key root@slurmdbd:/etc/munge
scp /home/munge.key root@loginnode:/etc/munge
2. On Slurm Master, Slurmdb and compute nodes:
- Set the permissions on master and the compute nodes.
chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
- Reboot and start the munge service
systemctl enable munge
systemctl start munge
- Test the munge on the master node
munge -n | unmunge
2. Slurm Installation steps:
- MariaDb Installation On Slurmdb Node :
- Install mariadb on master
yum install mariadb-server mariadb-devel
- Enable and start the mariadb service
systemctl start mariadb
systemctl enable mariadb
- Set up the root password and secure mariadb using the following command
mysql_secure_installation
- Login to the database through created password and then create and configure the slurm_acct_db as:
mysql -u root -p
Enter password:
MariaDB [(none)]> grant all on slurm_acct_db.* TO ‘slurm’@’slurmdb.unixadmin.in’ identified by ‘xyz’ with grant option;
SHOW VARIABLES LIKE ‘have_innodb’;
create database slurm_acct_db;
quit;
- Modify the innodb configuration:
Set the innodb_lock_wait_timeout,innodb_log_file_size and innodb_buffer_pool_size to larger values
vim /etc/my.cnf.d/innodb.cnf
[mysqld]
innodb_buffer_pool_size=1024M
innodb_log_file_size=64M
innodb_lock_wait_timeout=900
To implement this change you need to stop/ shutdown the database and move or remove the log files and then restart the database.
systemctl stop mariadb
mv /var/lib/mysql/ib_logfile? /tmp/
systemctl start mariadb
2. On Slurm Master:
- In order to install the Slurm , install the following prerequisites :
# yum install openssl openssl-devel pam-devel rpmbuild numactl numactl-devel hwloc hwloc-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel man2html libibmad libibumad
# yum install python3
# wget http://mirror.centos.org/centos/7/os/x86_64/Packages/perl-ExtUtils-MakeMaker-6.68-3.el7.noarch.rpm
# yum localinstall perl-ExtUtils-MakeMaker-6.68-3.el7.noarch.rpm
- Download and retrieve the tar ball
wget https://download.schedmd.com/slurm/slurm-20.02.5.tar.bz2
- Create the RPMs as
rpmbuild -ta slurm-20.02.5.tar.bz2
The RPM packages will typically be in /root/rpmbuild/RPMS/x86_64/ and should be installed on all relevant nodes.
- Share this RPMS created through NFS on all the nodes as :
yum install nfs-utils libnfsidmap
systemctl enable rpcbind
systemctl start rpcbind
systemctl enable nfs-server
systemctl start nfs-server
systemctl start rpc-statd
systemctl enable nfs-idmapd
Create a directory /source and export all the RPMs to it, so that it can be accessible to all nodes.
mkdir /source/slurm_20.02
chmod 777 /source/slurm_20.02
vi /etc/exports
/source 192.168.1.7(rw,sync,no_root_squash)
/source 192.168.1.8(rw,sync,no_root_squash)
exportfs -r
showmount -e
Export list for slurmmaster.unixadmin.in:
/source 192.168.1.8,192.168.1.7
- Now install the relevant RPMs on the master node.
yum localinstall slurm-20.02.5-1.el7.x86_64.rpm
yum localinstall slurm-perlapi-20.02.5-1.el7.x86_64.rpm
yum localinstall slurm-slurmctld-20.02.5-1.el7.x86_64.rpm
yum localinstall slurm-example-configs-20.02.5-1.el7.x86_64.rpm
yum localinstall slurm-torque-20.02.5-1.el7.x86_64.rpm
On Compute Nodes and Slurmdb node :
- Create a directory to mount the shared rpm through master node , enable and start nfs service.
yum install nfs-utils libnfsidmap
systemctl enable rpcbind
systemctl start rpcbind
showmount -e 192.168.1.6
Create the directory /mnt/source/slurm_20.02 and mount it
mkdir -p /mnt/source/slurm_20.02
mount 192.168.0.118:/source/slurm /mnt/source/slurm_20.02
- Now install the relevant RPMs on the Head/Slurmdbd node and the compute nodes.
On slurmdb node :
slurm-20.02.5-1.el7.x86_64.rpm
slurm-slurmdbd-20.02.5-1.el7.x86_64.rpm
slurm-devel-20.02.5-1.el7.x86_64.rpm
On Compute nodes:
slurm-20.02.5-1.el7.x86_64.rpm
slurm-perlapi-20.02.5-1.el7.x86_64.rpm
slurm-pam_slurm-20.02.5-1.el7.x86_64.rpm
slurm-li bpmi-20.02.5-1.el7.x86_64.rpm
slurm-slurmd-20.02.5-1.el7.x86_64.rpm
slurm-devel-20.02.5-1.el7.x86_64.rpm
slurm-example-configs-20.02.5-1.el7.x86_64.rpm
slurm-torque-20.02.5-1.el7.x86_64.rpm
- SLURM Configuration :
- Configure the Slurmdbd configuration file :
On master node edit the following configuration files as per the requirement:
vim /etc/slurm/slurmdbd.conf
## Example slurmdbd.conf file.## See the slurmdbd.conf man page for more information.## Archive info#ArchiveJobs=yes#ArchiveDir=”/tmp”#ArchiveSteps=yes#ArchiveScript=#JobPurge=12#StepPurge=1## Authentication infoAuthType=auth/munge#AuthInfo=/var/run/munge/munge.socket.2## slurmDBD infoDbdAddr=192.168.1.9DbdHost=slurmdbd.unixadmin.inDbdPort=6819SlurmUser=slurm#MessageTimeout=300DebugLevel=verbose#DefaultQOS=normal,standbyLogFile=/var/log/slurm/slurmdbd.logPidFile=/var/run/slurmdbd.pid#PluginDir=/usr/lib/slurm#PrivateData=accounts,users,usage,jobs#TrackWCKey=yes## Database infoStorageType=accounting_storage/mysqlStorageHost=slurmdbd.unixadmin.in#StoragePort=6819StoragePass=root1234StorageUser=slurmStorageLoc=slurm_acct_db |
Then Enable and start the slurmdbd service :
systemctl start slurmdbd
systemctl enable slurmdbd
systemctl status slurmdbd
- Modify the following parameters according to the cluster
Vim /etc/slurm/slurm.conf
## Example slurm.conf file. Please run configurator.html# (in doc/html) to build a configuration file customized# for your environment.### slurm.conf file generated by configurator.html.## See the slurm.conf man page for more information.#ClusterName=HPC_ClusterControlMachine=slurmmaster#ControlAddr=#BackupController=#BackupAddr=#SlurmUser=slurm#SlurmdUser=rootSlurmctldPort=6817SlurmdPort=6818AuthType=auth/munge#JobCredentialPrivateKey=#JobCredentialPublicCertificate=StateSaveLocation=/var/spool/slurmctldSlurmdSpoolDir=/var/spool/slurmdSwitchType=switch/noneMpiDefault=noneSlurmctldPidFile=/var/run/slurmctld.pidSlurmdPidFile=/var/run/slurmd.pidProctrackType=proctrack/pgid#PluginDir=#FirstJobId=ReturnToService=0#MaxJobCount=#PlugStackConfig=#PropagatePrioProcess=#PropagateResourceLimits=#PropagateResourceLimitsExcept=#Prolog=#Epilog=#SrunProlog=#SrunEpilog=#TaskProlog=#TaskEpilog=#TaskPlugin=#TrackWCKey=no#TreeWidth=50#TmpFS=#UsePAM=## TIMERSSlurmctldTimeout=300SlurmdTimeout=300InactiveLimit=0MinJobAge=300KillWait=30Waittime=0## SCHEDULINGSchedulerType=sched/backfill#SchedulerAuth=SelectType=select/cons_tresSelectTypeParameters=CR_Core#PriorityType=priority/multifactor#PriorityDecayHalfLife=14-0#PriorityUsageResetPeriod=14-0#PriorityWeightFairshare=100000#PriorityWeightAge=1000#PriorityWeightPartition=10000#PriorityWeightJobSize=1000#PriorityMaxAge=1-0## LOGGINGSlurmctldDebug=infoSlurmctldLogFile=/var/log/slurm/slurmctld.logSlurmdDebug=infoSlurmdLogFile=/var/log/slurm/slurmd.log#JobCompType=jobcomp/none#JobCompLoc=## ACCOUNTING#JobAcctGatherType=jobacct_gather/linux#JobAcctGatherFrequency=30#AccountingStorageType=accounting_storage/slurmdbdAccountingStorageHost=slurmdbd#AccountingStorageLoc=#AccountingStoragePass=root1234AccountingStorageUser=slurm## COMPUTE NODESNodeName=cnode[01-02] Procs=1 State=UNKNOWNPartitionName=Dev Nodes=ALL Default=YES MaxTime=INFINITE State=UP |
- When the slurmdbd.conf and slurm.conf parameters are filled correctly we need to send these configuration files to slurmdbd and all the compute nodes as:
First copy both the configuration files to /home directory using following command:
cp /etc/slurm/slurm.conf /home
cp /etc/slurm/slurmdbd.conf /home
Then, send these to /etc/slurm directory of all the compute nodes using the following command:
scp /home/slurm.conf root@cnode01:/etc/slurm
scp /home/slurmdbd.conf root@cnode01:/etc/slurm
scp /home/slurmdbd.conf root@slurmdbd:/etc/slurm
scp /home/slurm.conf root@slurmdbd:/etc/slurm
scp /home/slurm.conf root@loginnode:/etc/slurm
scp /home/slurmdbd.conf root@loginnode:/etc/slurm
- Create the Folders to host the logs and assign the permissions to it:
On Master Node:
mkdir /var/spool/slurmctld
chown slurm:slurm /var/spool/slurmctld
chmod 755 /var/spool/slurmctld
mkdir /var/log/slurm
touch /var/log/slurm/slurmctld.log
touch /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log
chown -R slurm:slurm /var/log/slurm/
On login node and Compute Nodes:
mkdir /var/spool/slurmd
chown slurm: /var/spool/slurmd
chmod 755 /var/spool/slurmd
mkdir /var/log/slurm/
touch /var/log/slurm/slurmd.log
chown -R slurm:slurm /var/log/slurm/slurmd.log
- Test the configuration using the following command
slurmd -C
The output will be as
NodeName=cnode01 CPUs=1 Boards=1 SocketsPerBoard=1 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770
UpTime=0-00:59:45
- Activate the services:
slurmd service on the compute node
systemctl enable slurmd.service
systemctl start slurmd.service
systemctl status slurmd.service
slurmctld service on the master node
systemctl enable slurmctld.service
systemctl start slurmctld.service
systemctl status slurmctld.service
Be the first to comment on "SLURM 20.02 Cluster Setup"