-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gpu improvements #66
Gpu improvements #66
Changes from all commits
0df3f2a
a13d803
09a6835
ff2b2fe
1d7974a
fbc05d6
0a4702a
9beb292
4134af3
3b7a182
ee726b5
1f5eb30
8649a95
e7bfbe1
bf14967
14f605e
59086af
918e3a8
85045b9
f2b2fda
d0093a5
4023d67
4e375b2
7224a37
822f7b7
21fc13a
aaec12b
df29563
823653d
a002bdc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
This file was deleted.
This file was deleted.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -410,19 +410,22 @@ This allows using a different mpirun command to launch unit tests | |
<DESC>NCAR GPU platform, os is Linux, 36 pes/node, batch system is pbs</DESC> | ||
<NODENAME_REGEX>casper*</NODENAME_REGEX> | ||
<OS>LINUX</OS> | ||
<COMPILERS>pgi,intel,nvhpc,pgi-gpu,nvhpc-gpu</COMPILERS> | ||
<COMPILERS>nvhpc,intel</COMPILERS> | ||
<MPILIBS>openmpi</MPILIBS> | ||
<CIME_OUTPUT_ROOT>/glade/scratch/$USER</CIME_OUTPUT_ROOT> | ||
<DIN_LOC_ROOT>$ENV{CESMDATAROOT}/inputdata</DIN_LOC_ROOT> | ||
<DIN_LOC_ROOT_CLMFORC>/glade/p/cgd/tss/CTSM_datm_forcing_data</DIN_LOC_ROOT_CLMFORC> | ||
<DOUT_S_ROOT>$CIME_OUTPUT_ROOT/archive/$CASE</DOUT_S_ROOT> | ||
<BASELINE_ROOT>$ENV{CESMDATAROOT}/cesm_baselines</BASELINE_ROOT> | ||
<CCSM_CPRNC>$ENV{CESMDATAROOT}/tools/cime/tools/cprnc/cprnc</CCSM_CPRNC> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know I haven't completed the cleanup here yet. |
||
<GMAKE_J>8</GMAKE_J> | ||
<BATCH_SYSTEM>pbs</BATCH_SYSTEM> | ||
<SUPPORTED_BY>ASAP/CISL</SUPPORTED_BY> | ||
<MAX_TASKS_PER_NODE>36</MAX_TASKS_PER_NODE> | ||
<MAX_GPUS_PER_NODE>8</MAX_GPUS_PER_NODE> | ||
<MAX_GPUS_PER_NODE compiler="nvhpc">8</MAX_GPUS_PER_NODE> | ||
<MAX_MPITASKS_PER_NODE>36</MAX_MPITASKS_PER_NODE> | ||
<MAX_CPUTASKS_PER_GPU_NODE>36</MAX_CPUTASKS_PER_GPU_NODE> | ||
<GPU_TYPES>v100,a100</GPU_TYPES> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we specify There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we handle fortran do concurrent here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a great question! In this case, |
||
<PROJECT_REQUIRED>TRUE</PROJECT_REQUIRED> | ||
<mpirun mpilib="default"> | ||
<executable>mpirun</executable> | ||
|
@@ -450,54 +453,22 @@ This allows using a different mpirun command to launch unit tests | |
<command name="load">ncarenv/1.3</command> | ||
<command name="load">cmake/3.18.2</command> | ||
</modules> | ||
<modules compiler="pgi"> | ||
<command name="load">pgi/20.4</command> | ||
</modules> | ||
<modules compiler="pgi-gpu"> | ||
<command name="load">pgi/20.4</command> | ||
</modules> | ||
<modules compiler="nvhpc"> | ||
<command name="load">nvhpc/22.2</command> | ||
</modules> | ||
<modules compiler="nvhpc-gpu"> | ||
<command name="load">nvhpc/22.2</command> | ||
</modules> | ||
<modules compiler="intel"> | ||
<command name="load">intel/19.1.1</command> | ||
<command name="load">mkl/2020.0.1</command> | ||
</modules> | ||
<modules mpilib="openmpi" compiler="pgi"> | ||
<command name="load">openmpi/4.1.0</command> | ||
<command name="load">netcdf-mpi/4.8.0</command> | ||
<command name="load">pnetcdf/1.12.2</command> | ||
</modules> | ||
<modules mpilib="mpi-serial" compiler="pgi"> | ||
<command name="load">netcdf/4.8.0</command> | ||
</modules> | ||
<modules mpilib="openmpi" compiler="pgi-gpu"> | ||
<command name="load">openmpi/4.1.0</command> | ||
<command name="load">netcdf-mpi/4.7.4</command> | ||
<command name="load">pnetcdf/1.12.2</command> | ||
<command name="load">cuda/11.0.3</command> | ||
</modules> | ||
<modules mpilib="mpi-serial" compiler="pgi-gpu"> | ||
<command name="load">netcdf/4.7.4</command> | ||
</modules> | ||
<modules mpilib="openmpi" compiler="nvhpc"> | ||
<command name="load">openmpi/4.1.4</command> | ||
<command name="load">netcdf-mpi/4.8.1</command> | ||
<command name="load">pnetcdf/1.12.3</command> | ||
</modules> | ||
<modules mpilib="mpi-serial" compiler="nvhpc"> | ||
<command name="load">netcdf/4.8.1</command> | ||
</modules> | ||
<modules mpilib="openmpi" compiler="nvhpc-gpu"> | ||
<command name="load">openmpi/4.1.4</command> | ||
<command name="load">netcdf-mpi/4.8.1</command> | ||
<command name="load">pnetcdf/1.12.3</command> | ||
<command name="load">cuda/11.4.0</command> | ||
<modules gpu_type="!none"> | ||
<command name="load">cuda/11.6</command> | ||
</modules> | ||
<modules mpilib="mpi-serial" compiler="nvhpc-gpu"> | ||
<modules mpilib="mpi-serial" compiler="nvhpc"> | ||
<command name="load">netcdf/4.8.1</command> | ||
</modules> | ||
<modules mpilib="openmpi" compiler="intel"> | ||
|
@@ -517,29 +488,21 @@ This allows using a different mpirun command to launch unit tests | |
<command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1/</command> | ||
<command name="load">esmf-8.4.0b08_casper-ncdfio-openmpi-O</command> | ||
</modules> | ||
<modules compiler="nvhpc-gpu" mpilib="openmpi" DEBUG="TRUE"> | ||
<modules compiler="nvhpc" mpilib="openmpi" DEBUG="TRUE"> | ||
<command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/nvhpc/22.2/</command> | ||
<command name="load">esmf-8.4.1b01-ncdfio-openmpi-g</command> | ||
<command name="load">esmf-8.4.1_casper-ncdfio-openmpi-g</command> | ||
</modules> | ||
<modules compiler="nvhpc-gpu" mpilib="openmpi" DEBUG="FALSE"> | ||
<modules compiler="nvhpc" mpilib="openmpi" DEBUG="FALSE"> | ||
<command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/nvhpc/22.2/</command> | ||
<command name="load">esmf-8.4.1b01-ncdfio-openmpi-O</command> | ||
</modules> | ||
<modules compiler="pgi" mpilib="openmpi" DEBUG="TRUE"> | ||
<command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/pgi/20.4/</command> | ||
<command name="load">esmf-8.4.0b08_casper-ncdfio-openmpi-g</command> | ||
</modules> | ||
<modules compiler="pgi" mpilib="openmpi" DEBUG="FALSE"> | ||
<command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/pgi/20.4/</command> | ||
<command name="load">esmf-8.2.0b11_casper-ncdfio-openmpi-O</command> | ||
<command name="load">esmf-8.4.1_casper-ncdfio-openmpi-O</command> | ||
</modules> | ||
<modules> | ||
<command name="load">ncarcompilers/0.5.0</command> | ||
</modules> | ||
<modules compiler="!pgi" DEBUG="FALSE" mpilib="openmpi"> | ||
<modules DEBUG="FALSE" mpilib="openmpi"> | ||
<command name="load">pio/2.5.10</command> | ||
</modules> | ||
<modules compiler="!pgi" DEBUG="TRUE" mpilib="openmpi"> | ||
<modules DEBUG="TRUE" mpilib="openmpi"> | ||
<command name="load">pio/2.5.10d</command> | ||
</modules> | ||
</module_system> | ||
|
@@ -580,7 +543,7 @@ This allows using a different mpirun command to launch unit tests | |
<DIN_LOC_ROOT_CLMFORC>/glade/p/cgd/tss/CTSM_datm_forcing_data</DIN_LOC_ROOT_CLMFORC> | ||
<DOUT_S_ROOT>$CIME_OUTPUT_ROOT/archive/$CASE</DOUT_S_ROOT> | ||
<BASELINE_ROOT>$ENV{CESMDATAROOT}/cesm_baselines</BASELINE_ROOT> | ||
<CCSM_CPRNC>$ENV{CESMDATAROOT}/tools/cime/tools/cprnc/cprnc.cheyenne</CCSM_CPRNC> | ||
<CCSM_CPRNC>$ENV{CESMDATAROOT}/tools/cime/tools/cprnc/cprnc</CCSM_CPRNC> | ||
<GMAKE_J>8</GMAKE_J> | ||
<BATCH_SYSTEM>pbs</BATCH_SYSTEM> | ||
<SUPPORTED_BY>cseg</SUPPORTED_BY> | ||
|
@@ -1850,7 +1813,11 @@ This allows using a different mpirun command to launch unit tests | |
<BATCH_SYSTEM>pbs</BATCH_SYSTEM> | ||
<SUPPORTED_BY>cseg</SUPPORTED_BY> | ||
<MAX_TASKS_PER_NODE>128</MAX_TASKS_PER_NODE> | ||
<MAX_GPUS_PER_NODE>4</MAX_GPUS_PER_NODE> | ||
<MAX_MPITASKS_PER_NODE>128</MAX_MPITASKS_PER_NODE> | ||
<MAX_CPUTASKS_PER_GPU_NODE>64</MAX_CPUTASKS_PER_GPU_NODE> | ||
<GPU_TYPES>a100</GPU_TYPES> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The same question about |
||
<GPU_OFFLOAD>openacc,openmp,combined</GPU_OFFLOAD> | ||
<PROJECT_REQUIRED>TRUE</PROJECT_REQUIRED> | ||
<mpirun mpilib="default"> | ||
<executable>mpiexec</executable> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At line 1838, shall we do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a feature of modules on gust (and soon derecho) - the two modules loaded above the purge are sticky and not affected by the purge command. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the clarification. That is clear to me now. |
||
|
@@ -1907,11 +1874,12 @@ This allows using a different mpirun command to launch unit tests | |
<modules mpilib="mpi-serial"> | ||
<command name="load">mpi-serial/2.3.0</command> | ||
</modules> | ||
|
||
<modules gpu_type="!none"> | ||
<command name="load">cuda/11.7.1</command> | ||
</modules> | ||
<modules mpilib="mpi-serial"> | ||
<command name="load">netcdf/4.9.1</command> | ||
</modules> | ||
|
||
<modules mpilib="!mpi-serial"> | ||
<command name="load">netcdf-mpi/4.9.1</command> | ||
<command name="load">parallel-netcdf/1.12.3</command> | ||
|
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this works but I am not sure how
gpu_enabled
actually works. Is it an XML variable defined somewhere? And how is it set to True or False during the build. A brief explanation will be very helpful.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done here: https://github.com/jedwards4b/cime/blob/add_gpu_gust/CIME/case/case.py#L457
gpu_enabled is an attribute of the case object and is set to true if GPU_TYPE is set to a valid value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jedwards4b for the details. That is very helpful!