Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules on Edison need to be updated #1356

Closed
ndkeen opened this issue Mar 30, 2017 · 11 comments
Closed

Modules on Edison need to be updated #1356

ndkeen opened this issue Mar 30, 2017 · 11 comments
Assignees

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Mar 30, 2017

In general, I need to up the versions of the modules on edison.
We are still using Intel 15 and we have found that Intel 17 is working well on other machines.

There are also currently 3 tests that hit an internal compiler error when using the GNU version we have and I've verified that these pass with a newer version.

I tried acme_developer after updating the modules for intel and GNU. HOMME is failing to build, however, note that it was failing previous to cime52 for some reason anyway.

These 2 tests are failing at runtime and look like they could be IO-related hangs (for both Intel and GNU)

ERS_Ld5.T62_oQU120.CMPASO-NYF.edison_gnu.20170329_164827_yc8ir7
SMS.T62_oQU120_ais20.MPAS_LISIO_TEST.edison_gnu.20170329_164827_yc8ir7

Started this branch: ndk/cime/edison_module_updates

@ndkeen ndkeen self-assigned this Mar 30, 2017
@rljacob
Copy link
Member

rljacob commented Mar 30, 2017

@golaz are you ok with updating the Intel compilers on edison?

@bishtgautam
Copy link
Contributor

bishtgautam commented Mar 30, 2017

I noticed that ALBANY_PATH was removed for Edison in 0c418bc. Does any of the failing tests require Albany?

Tagging @agsalin

@ndkeen
Copy link
Contributor Author

ndkeen commented Mar 30, 2017

No I don't think that path is affecting the tests.

@golaz
Copy link
Contributor

golaz commented Apr 3, 2017

@rljacob : I don't have any problem updating the compilers if we have experience with the newer versions on other machines. But since this is a substantial non-BFB change, should it be part of a new tag?

@rljacob
Copy link
Member

rljacob commented Apr 3, 2017

Sure. There are other non-BFB changes going on master. PR #1301 and PR #1335 for example.

@ndkeen
Copy link
Contributor Author

ndkeen commented Apr 3, 2017

@jayeshkrishna: Here are the current version numbers I'm trying for IO related modules

      <command name="load">cray-netcdf-hdf5parallel/4.4.0</command>
      <command name="load">cray-hdf5-parallel/1.8.16</command>
      <command name="load">cray-parallel-netcdf/1.6.1</command>

@ndkeen
Copy link
Contributor Author

ndkeen commented Apr 4, 2017

When I tested a fix by @jgfouca regarding HOMME on edison, I ran acme_developer with his branch and only updated the Intel compiler to v17.
The tests I mentioned above are now passing. The only failure was a runtime fail with:
SMS_D_Ln1.ne30_ne30.FC5AV1C-04.edison_intel
I don't see anything that useful in the directory to tell me what happened. It did run out of time -- could have been a hang. Might just try again.

The HOMME test built, and ran -- however had a problem with one of the executables, that may have nothing to do with compiler.

So it might make more sense to only update compiler, and not try adjusting any other versions.

@ndkeen
Copy link
Contributor Author

ndkeen commented May 12, 2017

I can update the GNU compiler only if I also update cray-mpich. I need to update cray-mpich to 7.5.1 (one version ahead of default) and tests will pass (except for HOMME).

I did test cray-mpich 7.5.1 with Intel and the tests still passed, so I think it's safe to update, but I can only update GNU and mpich together -- so that we would have 2 different versions of mpich used on the machine based on compiler chosen.

Building HOMME with GNU is still an issue as explained earlier. The cprnc tool that HOMME needs to build is trying to build with GNU, but use Intel compiler flags.

I could close this issue and re-open as a GNU-only issue.

@rljacob
Copy link
Member

rljacob commented May 12, 2017

You can just edit the title.

@ndkeen
Copy link
Contributor Author

ndkeen commented May 19, 2017

PR #1533, bumps us to GNU 6.2. As noted in that PR, this version of GNU required a newer version of mpich. To avoid changing the default for this machine which is intel15 and mpich725, we only mpich751 with GNU.

Additionally, instead of changing the current default of Intel, we instead added another compiler (to use --compiler=intel17). This newer version of Intel compiler also needed increased version of mpich and will use mpich751 as well.

Presumably, when intel 15 is dismissed, we can change the default and remove the compiler=intel17 option.

@ndkeen
Copy link
Contributor Author

ndkeen commented Jul 27, 2017

After the edison upgrade in July 2017, we increased all versions to defaults except for netcdf/hdf.

#1679

@ndkeen ndkeen closed this as completed Jul 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants