-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reads compiler/machine configs from E3SM CIME #31
Reads compiler/machine configs from E3SM CIME #31
Conversation
@grnydawn this looks good and works great on Chrysalis with intel. However, I did not have much luck on OLCF machines. On Frontier:
which is strange after it worked fine for crayclanggpu... I tried Summit too and it failed during the build with an unknow flag err (-ML) for a pgi build so it may be that CIME's setup for pgi/nvidia has a bad flag? |
@philipwjones , Thank you for the information. I'm currently investigating the issues you pointed out, and I wanted to share some of my initial thoughts: I encountered the same problem with crayclanggpu. It seems that the issue with crayclanggpu is connected to how the Scream team utilizes compiler configurations. The MPICC for crayclanggpu ends up pointing to hipcc instead of the Cray compiler wrapper, causing it to not recognize the paths for the MPI library. To address this, I need to incorporate MPI library and include paths. While this is feasible, I haven't yet determined the optimal way to provide this information in a manner that works across all systems. I tested crayclang on Frontier and encountered no issues. It's puzzling that your log includes hdf5-related problems, as no unit tests or other Omega code currently use the hdf5 library. I attempted the IBM compiler on Summit. Initially, I faced an issue due to a Python version conflict. After loading python/3.8.10, the problem was resolved, and I could successfully execute ctest by using "jsrun -n 1 ctest" on a batch node. I'll continue to explore solutions for the python version issue. |
On Frontier, hipcc was used as the C++ compiler for both MMF and Scream as Cray's C++ compiler (invoked via CC wrapper) lagged a bit in terms of rocm features. The MPI libraries and include paths were specified in the Do I understand correctly that you are only parsing config_machines.xml and not looking at the cmake_macro files? There are certain other things like compiler options/workarounds, PIO hints and misc stuff that is specified in those files. It might be too big of a task to parse everything from there but perhaps we should extract a subset? Or should those be specified as needed in Omega separately? Or a more radical idea which probably won't work: should we include that cmake_macro file in its entirety? There are certainly some options which won't be applicable and we need to override/ignore some settings. |
Additionally, do we need a |
@sarats . Thanks for the comments. Current Omega cmake build do read cmake files in So far, we are not using |
@philipwjones , I have updated this PR to resolve the issues that you reported. Could you rerun your tests with this update to see how it goes? Major updates are:
|
Ok...it works for crayclanggpu on Frontier, but not for the cpu crayclang target. For the crayclang target, it complains that there is "No such file or directory" called -lstdc++ so I think the -lstdc++ argument must be misplaced and the compiler thinks it's a file that needs to be processed rather than a library. I couldn't find where that string is put together and why it would be different for crayclang when it worked fine for crayclanggpu. Haven't tried other builds yet. |
On Summit - cpu build with pgi target complains about unknown switch (-MF) for nvhpc++. pgigpu target has the same error and also has some missing Cmake variables when I try to build the CUDA back end (OMEGA_ARCH=CUDA) - missing CMAKE_CUDA_COMPILE_OBJECT and _CMAKE_CUDA_WHOLE_FLAG. |
@philipwjones , Thanks for the checks. I also have the same issue that you pointed. I think I need to dig more on CMake. Also, I may need to setup Cmake "unit tests" to verify my changes in Omega Cmake for multiple systems as well as multiple compilers. |
@grnydawn - no problem, it seems to be really close to working everywhere. And I know it's hard to test all the supported compiler and machine options - I was hoping leveraging the E3SM options would make it easier, but even harvesting those seems to be non-trivial. |
@grnydawn, I see that this adds python code to the Omega repo. I'd like to add some linting for python code equivalent to what we use in Polaris. Rather than complicating this PR, I'll add that separately and clean up the python script later. |
This merge update the capability of using E3SM configurations including compiler and compiler options. * uses MPICXX from E3SM CIME as a default Omega compiler * User can specify compiler and compiler/linker options using OMEGA_CXX_COMPILER, OMEGA_CXX_FLAGS, and OMEGA_LINK_OPTIONS * adds gen_machine_info.py to read config_machines.xml in E3SM CIME * adds the unit testing for data types * added new cmake variables of OMEGA_INCLUDE_DIRECTORIES OMEGA_LINK_DIRECTORIES * better supports compilers that are not wrapped in a compiler wrapper such as hipcc * uses cmake variables from E3SM for GPU models including CUDA_FLAGS, HIP, and SYCL * supports older Python versions
4938590
to
1bb81df
Compare
@xylar , I rebased this branch onto develop branch as you suggested. I saw some lint checking for python code in your commits and will make modification in my python code if the CI check detects any. |
Yep, it looks like there are some things to fix: |
@grnydawn If it's easy to do at this stage, can you also grab the definition of mpiexec from the CIME machine files too? Some of the unit tests (like MachEnv and the upcoming Decomp) require an 8-task MPI run and it would be nice to have the test command be something like ${MPI_EXEC} -n 8 my_unit_test. If you'd rather wait for a later PR to add this, that's fine - only add this now if it's convenient. I'll just use srun in the interim since that works across several machines. |
@philipwjones , I think it is not difficult to grab MPI_EXEC from CIME config. file. I need to update my python code anyway to fix the linting issues. |
@xylar , I fixed all linting issues except one below: components/omega/gen_machine_info.py:33:1: E402 module level import not at top of file I think that I understand why I got this error. I created a class inherited from one of CIME classes whose source file location is passed through command-line argument. I thought this approach is more flexible than hard-coded CIME path. So, can we skip E402 test for the python code? |
@grnydawn, yes, while that approach to imports is not preferable, it is unfortunately necessary for CIME, which doesn't play by normal python rules (ESMCI/cime#3886). To skip the CI check, please add: from CIME.XML.machines import Machines # noqa: E402 Note that there are 2 spaces before the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to go for this stage. We can revisit once CIME CMake changes are finished upstream.
* creates scripts that prerun module commands in the build direcotry: omega_build.sh, omega_run.sh and omega_ctest.sh * YAKL library can be automatically built using HIP or CUDA * updates documents for cmake-based build.
@xylar , I tried to fix some formatting issues found during CI tests, but could not fix them all. Could you look at the latest commits and guide me how to fix them? |
@philipwjones , I have updated this PR with better supporting GPU and CPU builds on Frontier and Perlmutter. Summit is not supported yet. I think I know what are the issues but it will take a little more time to support Summit. To build and run the standalone Omega build, I created new scripts in the build directory: "omega_build.sh" for compiling and "omega_ctest.sh" for invoking the ctest and "omega_run.sh" to run the dummy executable. All the scripts "sources" "omega_env.sh" to match the module and env. variables set in "config_machines.xml" of CIME. Please review this update and let me know if you find any issue. |
@xylar Thanks for the explanations. I fixed the issues and CI tests are passed now. Sorry that I forget to run the pre-commit command of mamba before pushing commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this across Frontier (crayclang/crayclanggpu), Perlmutter (pm-cpu/nvidia) and Chrysalis (Intel) and it both build and runs CTest successfully. Thanks @grnydawn
@philipwjones Thanks for testing. I updated the user guide per your suggestion of using OMEGA_CIME_MACHINE. Please merge this when you finish your review. |
Clean up column code in mpas-si icepack driver
Reads compiler/machine configs from E3SM CIME(config_machines.xml and cmake-macros/*.cmake files)
Checklist