Skip to content

Installation notes for old systems

Fredrik Jansson edited this page Sep 2, 2022 · 2 revisions

Cartesius

see Cartesius description and Batch usage instructions.

Compilation - GNU Fortran

git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik

mkdir build
cd build

export SYST=gnu-fast
module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30

cmake ..

make VERBOSE=1 -j 4

The reason for replacing the default OpenMPI 3.1.1 with 3.1.4 is that 3.1.1 contains a bug which caused crashes on Lisa.

To compile with the optional HYPRE library, add/substitute the following:

module load Hypre/2.14.0-foss-2018b
cmake .. -DUSE_HYPRE=True -DHYPRE_LIB=/sw/arch/RedHatEnterpriseServer7/EB_production/2019/software/Hypre/2.14.0-foss-2018b/lib/libHYPRE.a

Compilation - Intel Fortran

git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik

mkdir build
cd build

export SYST=lisa-intel

module load 2019
module load CMake
module load intel/2018b
module load netCDF-Fortran/4.4.4-intel-2018b
module load FFTW/3.3.8-intel-2018b    # optional
module load Hypre/2.14.0-intel-2018b  # optional


cmake ..
# todo: add optional FFTW and HYPRE flags

make VERBOSE=1 -j 4

Job script

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -n 16  #total number of tasks, number of nodes calculated automatically 

# Other useful SBATCH options
# #SBATCH -N 2  #number of nodes 
# #SBATCH --ntasks-per-node=16
# #SBATCH --constraint=ivy # Runs only on Ivy Bridge nodes
# #SBATCH --constraint=haswell # Runs only on Haswell nodes (faster, AVX2)

module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30
# module load Hypre/2.14.0-foss-2018b

DALES=$HOME/dales/build/src/dales4

# cd somewhere - otherwise runs in same directory as submission
srun $DALES namoptions-hypre.001

Tuning

Note that Cartesius contains both Haswell and Ivy Bridge nodes. Haswell are faster, and support AVX2 instructions. To get the full benefit of them, DALES should be compiled with AVX2 support, and will then be incompatible with the older node type (request node type in the job script). For consistent benchmarking, one should request a specific node type in the job script.

ECMWF Cray

Login to cca (see the documentation).

Compilation

Note that the Fortran compiler on this machine is called ftn.

Here is an example of how to compile DALES with the intel compiler Make sure that the following lines (or something similar depending on your own preferences) are part of your CmakeLists.txt file:

elseif("$ENV{SYST}" STREQUAL "ECMWF-intel")
 set(CMAKE_Fortran_COMPILER "ftn")
 set(CMAKE_Fortran_FLAGS "-r8 -ftz -extend_source" CACHE STRING "")
 set(CMAKE_Fortran_FLAGS_RELEASE "-g -traceback -Ofast -xHost" CACHE STRING "")
 set(CMAKE_Fortran_FLAGS_DEBUG "-traceback -fpe1 -O0 -g -check all" CACHE STRING "")

For compiling,set the system variable by typing

export SYST=ECMWF-intel

and load the right modules

prgenvswitchto intel
module load netcdf4/4.4.1
module load cmake

Then proceed as usual (cmake & make).

Scaling

Here is an overview of some very simple and very limited scaling tests on that machine, mostly to demonstrate the effect of spreading your job over several nodes and of using hyperthreading (the later seems to be highly case-sensitive though). The test was done with a cumulus convection case with 36x144x296 grid points on a 3.6x14.4x17.9 km^3 domain that was run for 4 hours with quite a few statistics etc. turned on.

  • 1 node, hyperthreading on (i.e. 72 tasks per node): 11226 s
  • 1 node, hyperthreading off (i.e. 36 tasks per node): 7079 s
  • 2 nodes, hyperthreading on (i.e. 72 tasks per node): 8822 s
  • 2 nodes, hyperthreading off (i.e. 36 tasks per node): 5370 s

Take-away message: Hyperthreading increases (!) run time by about 60 percent (in this case!) and scaling is clearly not linear when you use more than one node (i.e. when the program has to communicate over the network).

Job script

Jobs are scheduled using PBS. Here is an example job script:

#!/bin/ksh

#PBS -q np			# <-- queue for parallel runs (alternatively use ns or nf)
#PBS -N jobname
#PBS -l EC_nodes=2		# <-- number of nodes (each has 36 CPUs)
#PBS -l EC_tasks_per_node=36	# <-- use the full node
#PBS -l EC_hyperthreads=1	# <-- hyperthreading (1: off, 2: on)
#PBS -l walltime=48:00:00	# <-- maximum of 48 h wall clock time per job
#PBS -m abe			# <-- email notification on abortion/start/end
#PBS -M johndoe@email.com	# <-- your email address

# load the same modules as during compilation
prgenvswitchto intel
module load netcdf4/4.4.1

cd /path/to/your/work/directory

aprun -N $EC_tasks_per_node -n $EC_total_tasks -j $EC_hyperthreads dales

Warm starts after 48 h

Since the machine only allows for jobs of maximum 48 h wall clock time, you might have to re-submit your simulations several times (warm start) to get to the desired simulation time. There are basically two approaches to do this somewhat automatically (they both have pros and cons):

  1. Find a nice length of simulation that can be finished in say 1 day to leave a generous margin, and then schedule several of these jobs in sequence using

    qsub -W depend=afterok:<PREVIOUS_JOBID> jobfile
    

    This will start the following job once the previous one finished successfully. (Don't forget to set lwarmstart, startfile and runtime correctly in the namoptions file!) This method has the advantage that it does not waste any computation time.

  2. Alternatively, let the simulation run as far as it gets within 48 h wall time (and save init files very regularly) and submit a job that automatically figures out how to do the warm start. This method has the advantage that it minimises the number of output files and jobs that you have to run. For this, submit the following job with

    qsub -W depend=afternotok:<FIRST_JOBID> jobfile
    

    This will start the following job once the previous one finished with a non-zero exit code (most likely that happens when it runs out of time). Add these lines of code to your job file to automatically do the warm start based on the latest init files that DALES has created and adjust the run time in the namoptions accordingly:

    Exp_dir=/path/to/your/work/directory	# <-- this is where you run the next 48 h
    Warm_dir=/path/to/your/init/directory	# <-- this is where your init files are
    
    cd $Exp_dir
    
    # find out how many hours are completed
    strlength=$(ls $Warm_dir/initd0* | tail -1 | wc -c)
    cutstart=$((strlength-18))
    cutend=$((cutstart+1))
    hrsdone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend)
    cutstart=$((cutstart+3))
    cutend=$((cutend+3))
    mindone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend)
    
    # copy the init files to the work directory
    cp $Warm_dir/init[sd]0${hrsdone}h${mindone}m* $Exp_dir/.
    
    # adjust the namoptions file
    cp $Exp_dir/namoptions.original $Exp_dir/namoptions
    hrsdone=$(echo $hrsdone | sed 's/^0*//')  # remove leading 0s
    mindone=$(echo $mindone | sed 's/^0*//')
    secdone=$((hrsdone*3600+mindone*60))
    sectodo=$((172800-secdone))		# <-- adjust your simulation time here (2 days here)
    startfname=$(ls $Exp_dir | head -1)
    sed -i "s/^startfile.*/startfile = '${startfname}'/" $Exp_dir/namoptions
    sed -i "s/^runtime.*/runtime = ${sectodo}/" $Exp_dir/namoptions
    
    # then continue with the usual stuff
    

    Note that the directory needs to contain a namoptions.original file (basically a copy of the one from the previous simulation) in which lwarmstart is set to true and the lines for the startfile and runtime are present but empty, e.g.:

    &RUN
    iexpnr = 002
    lwarmstart = .true.
    startfile = 
    runtime = 
    /