-
Notifications
You must be signed in to change notification settings - Fork 54
Installation notes for old systems
see Cartesius description and Batch usage instructions.
git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik
mkdir build
cd build
export SYST=gnu-fast
module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30
cmake ..
make VERBOSE=1 -j 4
The reason for replacing the default OpenMPI 3.1.1 with 3.1.4 is that 3.1.1 contains a bug which caused crashes on Lisa.
To compile with the optional HYPRE library, add/substitute the following:
module load Hypre/2.14.0-foss-2018b
cmake .. -DUSE_HYPRE=True -DHYPRE_LIB=/sw/arch/RedHatEnterpriseServer7/EB_production/2019/software/Hypre/2.14.0-foss-2018b/lib/libHYPRE.a
git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik
mkdir build
cd build
export SYST=lisa-intel
module load 2019
module load CMake
module load intel/2018b
module load netCDF-Fortran/4.4.4-intel-2018b
module load FFTW/3.3.8-intel-2018b # optional
module load Hypre/2.14.0-intel-2018b # optional
cmake ..
# todo: add optional FFTW and HYPRE flags
make VERBOSE=1 -j 4
#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -n 16 #total number of tasks, number of nodes calculated automatically
# Other useful SBATCH options
# #SBATCH -N 2 #number of nodes
# #SBATCH --ntasks-per-node=16
# #SBATCH --constraint=ivy # Runs only on Ivy Bridge nodes
# #SBATCH --constraint=haswell # Runs only on Haswell nodes (faster, AVX2)
module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30
# module load Hypre/2.14.0-foss-2018b
DALES=$HOME/dales/build/src/dales4
# cd somewhere - otherwise runs in same directory as submission
srun $DALES namoptions-hypre.001
Note that Cartesius contains both Haswell and Ivy Bridge nodes. Haswell are faster, and support AVX2 instructions. To get the full benefit of them, DALES should be compiled with AVX2 support, and will then be incompatible with the older node type (request node type in the job script). For consistent benchmarking, one should request a specific node type in the job script.
Login to cca
(see the documentation).
Note that the Fortran compiler on this machine is called ftn
.
Here is an example of how to compile DALES with the intel compiler Make sure that the following lines (or something similar depending on your own preferences) are part of your CmakeLists.txt
file:
elseif("$ENV{SYST}" STREQUAL "ECMWF-intel")
set(CMAKE_Fortran_COMPILER "ftn")
set(CMAKE_Fortran_FLAGS "-r8 -ftz -extend_source" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_RELEASE "-g -traceback -Ofast -xHost" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_DEBUG "-traceback -fpe1 -O0 -g -check all" CACHE STRING "")
For compiling,set the system variable by typing
export SYST=ECMWF-intel
and load the right modules
prgenvswitchto intel
module load netcdf4/4.4.1
module load cmake
Then proceed as usual (cmake
& make
).
Here is an overview of some very simple and very limited scaling tests on that machine, mostly to demonstrate the effect of spreading your job over several nodes and of using hyperthreading (the later seems to be highly case-sensitive though). The test was done with a cumulus convection case with 36x144x296 grid points on a 3.6x14.4x17.9 km^3 domain that was run for 4 hours with quite a few statistics etc. turned on.
- 1 node, hyperthreading on (i.e. 72 tasks per node): 11226 s
- 1 node, hyperthreading off (i.e. 36 tasks per node): 7079 s
- 2 nodes, hyperthreading on (i.e. 72 tasks per node): 8822 s
- 2 nodes, hyperthreading off (i.e. 36 tasks per node): 5370 s
Take-away message: Hyperthreading increases (!) run time by about 60 percent (in this case!) and scaling is clearly not linear when you use more than one node (i.e. when the program has to communicate over the network).
Jobs are scheduled using PBS. Here is an example job script:
#!/bin/ksh
#PBS -q np # <-- queue for parallel runs (alternatively use ns or nf)
#PBS -N jobname
#PBS -l EC_nodes=2 # <-- number of nodes (each has 36 CPUs)
#PBS -l EC_tasks_per_node=36 # <-- use the full node
#PBS -l EC_hyperthreads=1 # <-- hyperthreading (1: off, 2: on)
#PBS -l walltime=48:00:00 # <-- maximum of 48 h wall clock time per job
#PBS -m abe # <-- email notification on abortion/start/end
#PBS -M johndoe@email.com # <-- your email address
# load the same modules as during compilation
prgenvswitchto intel
module load netcdf4/4.4.1
cd /path/to/your/work/directory
aprun -N $EC_tasks_per_node -n $EC_total_tasks -j $EC_hyperthreads dales
Since the machine only allows for jobs of maximum 48 h wall clock time, you might have to re-submit your simulations several times (warm start) to get to the desired simulation time. There are basically two approaches to do this somewhat automatically (they both have pros and cons):
-
Find a nice length of simulation that can be finished in say 1 day to leave a generous margin, and then schedule several of these jobs in sequence using
qsub -W depend=afterok:<PREVIOUS_JOBID> jobfile
This will start the following job once the previous one finished successfully. (Don't forget to set
lwarmstart
,startfile
andruntime
correctly in thenamoptions
file!) This method has the advantage that it does not waste any computation time. -
Alternatively, let the simulation run as far as it gets within 48 h wall time (and save init files very regularly) and submit a job that automatically figures out how to do the warm start. This method has the advantage that it minimises the number of output files and jobs that you have to run. For this, submit the following job with
qsub -W depend=afternotok:<FIRST_JOBID> jobfile
This will start the following job once the previous one finished with a non-zero exit code (most likely that happens when it runs out of time). Add these lines of code to your job file to automatically do the warm start based on the latest init files that DALES has created and adjust the run time in the namoptions accordingly:
Exp_dir=/path/to/your/work/directory # <-- this is where you run the next 48 h Warm_dir=/path/to/your/init/directory # <-- this is where your init files are cd $Exp_dir # find out how many hours are completed strlength=$(ls $Warm_dir/initd0* | tail -1 | wc -c) cutstart=$((strlength-18)) cutend=$((cutstart+1)) hrsdone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend) cutstart=$((cutstart+3)) cutend=$((cutend+3)) mindone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend) # copy the init files to the work directory cp $Warm_dir/init[sd]0${hrsdone}h${mindone}m* $Exp_dir/. # adjust the namoptions file cp $Exp_dir/namoptions.original $Exp_dir/namoptions hrsdone=$(echo $hrsdone | sed 's/^0*//') # remove leading 0s mindone=$(echo $mindone | sed 's/^0*//') secdone=$((hrsdone*3600+mindone*60)) sectodo=$((172800-secdone)) # <-- adjust your simulation time here (2 days here) startfname=$(ls $Exp_dir | head -1) sed -i "s/^startfile.*/startfile = '${startfname}'/" $Exp_dir/namoptions sed -i "s/^runtime.*/runtime = ${sectodo}/" $Exp_dir/namoptions # then continue with the usual stuff
Note that the directory needs to contain a
namoptions.original
file (basically a copy of the one from the previous simulation) in whichlwarmstart
is set to true and the lines for thestartfile
andruntime
are present but empty, e.g.:&RUN iexpnr = 002 lwarmstart = .true. startfile = runtime = /
See the users guide.
Warning: don't use the 2018b module set - OpenMPI version 3.1.1 which is included there has been found to cause crashes. See Quirks. The module set below is working.
# -- load modules both for compilation and run script --
module load pre2019
module load foss/2017b
module load netCDF-Fortran/4.4.4-foss-2017b
module load cmake
# -- compile --
export SYST=gnu-fast
# enable aggressive optimization flags, to be added in Dales 4.2
cd dales
mkdir build
cd build
cmake ..
make
#PBS -lnodes=2:ppn=16:cpu3
#PBS -lwalltime=2:00:00
module load eb
module load foss/2017b
module load netcdf/gnu/4.2.1-gf4.7a
# Path to the Dales program
DALES=~/dales/build2/src/dales.exe
EXPERIMENT=~/your_case_directory/
cd $EXPERIMENT
mpiexec $DALES namoptions.001
cpu3
in the job script specifies a particular cpu type, see job requirements. If omitted, the job may run on any available cpu type, which can confuse benchmarking by influencing the performance.
ppn
is processes per node. The Lisa nodes have 8 cores, with hyperthreading 16 processes per node will fit.
Hyperthreading seems beneficial for Dales.
mpiexec
by default launches as many MPI tasks as there are slots available. Note that the number of tasks should be compatible with nprocx
, nprocy
in the namelist (specify 0 to determine them automatically). Also, itot
must be divisible by nprocx
, and jtot
by nprocy
.