-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMake failing for nvidiagpu
on Perlmutter-GPU
#55
Comments
@xylar I also got the same issue on Perlmutter. I think I need to understand more about the impact of using the compiler configurations from CIME. |
@grnydawn, no problem. I'm a total amateur at CMake so I really appreciate what you've done so far. We're all learning the process here. |
@grnydawn, it looks like E3SM might be forced to move do a new cudatoolkit on Perlmutter-GPU soon, see https://acmeclimate.slack.com/archives/C021DPJEL9X/p1720035170508459.. Let's keep an eye on it and check back on this issue if that happens. |
With #97, I'm able to build CTests but I'm seeing:
over and over, specifically:
|
@xylar, I tried to reproduce this issue on pm-gpu, but I couldn't. I was able to build and run most, but not all, of the test cases without a problem on pm-gpu using the nvidiagpu compiler. One thing I want to check is whether an Omega scripts for build/run can use a relative path to source a file created in the E3SM case. The omega_build.sh and omega_ctest.sh scripts source "./e3smcase/.env_mach_specific.sh" before they execute. It seems that the sourcing was not completed. |
Okay, I will try again and document my difficulties in more detail. |
@grnydawn, are you able to build with |
@xylar Ah, I ran my test using Phil's omega/sync-e3sm, not develop. |
Okay, that's what I will continue doing as well. |
I was asking because I can't even build with |
I am going to build using develop branch now. |
Building seems to work fine (with #!/usr/bin/env bash
cwd=${PWD}
module load cmake
# quit on errors
set -e
# trace commands
set -x
cd /global/u2/x/xylar/e3sm_work/polaris/main/e3sm_submodules/omega/sync-e3sm
git submodule update --init --recursive externals/YAKL externals/ekat \
externals/scorpio cime
cd ${cwd}
rm -rf build_omega/build_pm-gpu_nvidiagpu
mkdir -p build_omega/build_pm-gpu_nvidiagpu
cd build_omega/build_pm-gpu_nvidiagpu
export METIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-gpu/spack/dev_polaris_0_3_0_nvidiagpu_mpich/var/spack/environments/dev_polaris_0_3_0_nvidiagpu_mpich/.spack-env/view
export PARMETIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-gpu/spack/dev_polaris_0_3_0_nvidiagpu_mpich/var/spack/environments/dev_polaris_0_3_0_nvidiagpu_mpich/.spack-env/view
cmake \
-DOMEGA_BUILD_TYPE=Release \
-DOMEGA_CIME_COMPILER=nvidiagpu \
-DOMEGA_CIME_MACHINE=pm-gpu \
-DOMEGA_METIS_ROOT=${METIS_ROOT} \
-DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT} \
-DOMEGA_BUILD_TEST=ON \
-Wno-dev \
-S /global/u2/x/xylar/e3sm_work/polaris/main/e3sm_submodules/omega/sync-e3sm/components/omega \
-B .
./omega_build.sh
cd test
ln -sfn /global/cfs/cdirs/e3sm/polaris/ocean/omega_ctest/ocean.QU.240km.151209.nc OmegaMesh.nc
ln -sfn /global/cfs/cdirs/e3sm/polaris/ocean/omega_ctest/PlanarPeriodic48x48.nc OmegaPlanarMesh.nc
ln -sfn /global/cfs/cdirs/e3sm/polaris/ocean/omega_ctest/cosine_bell_icos480_initial_state.230220.nc OmegaSphereMesh.nc |
My job script looks like this: #!/bin/bash
#SBATCH --job-name=omega_ctest_pm-gpu_nvidiagpu
#SBATCH --account=e3sm
#SBATCH --nodes=1
#SBATCH --output=omega_ctest_pm-gpu_nvidiagpu.o%j
#SBATCH --exclusive
#SBATCH --time=0:15:00
#SBATCH --qos=debug
#SBATCH --constraint=gpu
cd /global/u2/x/xylar/e3sm_work/polaris/main/build_omega/build_pm-gpu_nvidiagpu
./omega_ctest.sh @grnydawn, do you see any issues there? |
@xylar Could you add "-DOMEGA_ARCH=CUDA" in the cmake command line. Omega build system may detect that it is CUDA build but I have to check if that is right or not. |
I can add that manually now. How do I know in general with the correct |
I think that knowing the compiler and machine does not guarantee the user's intended build target architecture. However, the Omega build system checks if nvcc or hipcc is available on the system if OMEGA_ARCH is not specified, and tries to use one of them according to a certain compiler priority. But, in general, the Omega build system does not know the user's intended target architecture. |
Okay, so is a user supposed to know their |
Also, it doesn't seem like detection of |
Unfortunately, rebuilding with
|
I'm going to move the conversation to Slack. |
Okay, this worked for me when I added @grnydawn, I think we can close this issue. |
@xylar, I am good to close this issue. Thanks for the work! BTW, we may continue to discuss about the usage of OMEGA_ARCH somewhere else. |
When I run:
I'm seeing:
The text was updated successfully, but these errors were encountered: