-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNU build on Hera is failing #962
Comments
@AlexanderRichert-NOAA - FYI |
I'll look into this, but tagging @climbfuji who may have a more immediate answer on matters of OpenMPI on Hera. All I can see in terms of system modules is openmpi/4.1.6_gnu9.2.0 ... |
When I tried openmpi 4.1.6, other libraries would no longer load. I went around in circles before giving up. |
I have v1.7 working. I will check in my branch so you can take a look. |
Also tagging @RatkoVasic-NOAA in case he knows of recent changes -- it looks like the modification date on the openmpi module file is this last Tuesday the 11th. I just created an issue for this under spack-stack: JCSDA/spack-stack#1146 |
@GeorgeGayno-NOAA @AlexanderRichert-NOAA |
I'm confused-- Why was it working a few days ago but isn't now? Did someone revert the configuration back to trying to use 4.1.5..? |
We don't use 4.1.5 for some time (/scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles/openmpi/4.1.5). We use in SRW now (going with spack-stack 1.6.0):
|
I don't follow. How is it that the modules/MODULEPATH settings in https://github.com/ufs-community/UFS_UTILS/blob/develop/modulefiles/build.hera.gnu.lua were working until a few days ago but aren't working now? Did something about the modulefiles change so that it's not pointing to the spack-stack-specific OpenMPI 4.1.5 installation? |
I didn't know about UFS_UTILS... I was talking about WM and SRW. I can take a look into that modulefile. |
@GeorgeGayno-NOAA try now |
@AlexanderRichert-NOAA I manually added line: |
|
Thanks. I can now load the stack-openmpi module, and for that matter build UFS_UTILS@develop without any modifications. |
UFS_UTILS now compiles, but the regression tests fail:
For more details, see this log file: /scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS/reg_tests/chgres_cube/consistency.log01.fail |
Can you try /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8-ompi416/install/modulefiles/Core? Note the openmpi version change to 4.1.6. This stack uses the Hera admin-provided openmpi (I'm not sure why this wasn't used in the rocky8 rebuild for 1.6.0). |
when using Gnu Fortran. Fixes ufs-community#962.
Gnu Fortran. Fixes ufs-community#962.
compiling with GNU. Fixes ufs-community#962.
Using ad8c76f, I was able to compile using Gnu on Hera. The unit tests passed. All regression tests (except one) ran to completion. Some passed. Some differed from the baseline, although the differences were very small. The first global_cycle regression test had a seg fault in the sfcsub.F routine.
Fixing this seg fault is beyond the scope of this issue. I will make a note and open another issue to address it. |
This test was repeated with 05b6fc2. The results were the same. |
This test was repeated using 4dca77a. The results were the same. |
I just tried compiling develop using Gnu using 2794d41 (the hash which prompted this issue) and 3ef2e6b. It works! @RatkoVasic-NOAA - what is going on? |
We did a rebuild of openmpi on Hera recently, as previously we were trying to use the copy built under CentOS. Does that possibly explain what you're referring to? |
Ok. That must explain why it is working again. In my branch (#965) I point to another stack. Should I revert back to what was used before? |
The only difference between the two environments is that unified-env-rocky8-ompi416 uses the sys admin-installed openmpi/4.1.6_gnu9.2.0 module, whereas unified-env-rocky8 uses a copy of openmpi 4.1.5 built by the spack-stack team (I'm not sure offhand why we have both). I would recommend using unified-env-rocky8-ompi416 since it's not clear what if any network fabric support the other openmpi was built with (whereas the sys admin-installed openmpi 4.1.6 is definitely built with UCX support which is almost certainly what you want). |
Thanks. Do you have any further comments on #965? I would like to merge it. |
The head of develop (2794d41) no longer compiles on Hera with Gnu. I get this error:
The text was updated successfully, but these errors were encountered: