-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpi-serial builds on Cori-haswell (for SCM) #1615
Comments
@ndkeen - since you're the local Cori afficionado, I figured you're more likely than anyone else to be able to help here... any ideas? |
Simplest solution is to not try to build with PAPI. This is not on by default on Titan. Don't know why it would be on Cori-KNL. Need to remove -DHAVE_PAPI from the gptl compile options. Might check if this is in Macros.make in your case directory (and remove it if so)? Otherwise, someone more CIME savvy will need to advise. Could also try loading the papi module (in env_mach_specific.xml in your case directory). |
Yes, Macros.make has gptl CPP definitions. From a Titan case.
Would need to determine where these are defined. |
HAVE_PAPI is defined in config_compilers.xml for
Since we have not been using PAPI in production runs, disabling it by default for these systems and compilers makes sense. However, this is a POC call ( @ndkeen ), and perhaps there is another way that will work just for the mpi-serial case? |
Hey! Sorry, this came out when I was on vacation and I missed it. I ran into this myself trying to run the |
So I might go ahead with a PR to adjust the module commands as it will improve things, but there may still be some work. |
Hi @ndkeen , thanks for your help on this so far. Myself (and others) are trying to run the SCM on Edison post machine update with the most recent master. The model compiles fine, but then dies during initialization with the error you mentioned above: 0: NetCDF: HDF error Any suggestions how to overcome this or should this be a new gitissue? thanks! |
Yea, that's what we are seeing. I'm not sure what that means or how best to proceed. I think it is an error from PIO, so was hoping someone knew what it meant. I added a little more to this issue: #1633 |
HDF? Does that mean its a parallel build of netcdf? You need a serial build to work with mpi-serial. |
No, we use the serial versions for mpi-serial. As far as I know, this is the first time mpi-serial has been tested on the nersc machines. |
Note that you should be able to use the GNU compiler until we figure out what's happening. |
This issue was resolved (and documented elsewhere, though I can't find it at the moment) by changing the type of a certain netcdf file used from General discussion of netcdf file types at NERSC here: |
@csjack and others are interested in running the single column model on Cori-haswell, but were having troubles. I just attempted to build on Cori but got a compile error in the gptl build (bldlog tail attached). I turned off mpi-serial for the build, as a test, and the model compiled just fine but did not run (as SCM needs to be built with mpi-serial for a successful run). Any suggestions on how to get up and running with mpi-serial on Cori-haswell?
cori.gptl.bldlog.tail.txt
The text was updated successfully, but these errors were encountered: