Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cam.buildlib failed on Summit with default IBM XL compiler #3183

Closed
dqwu opened this issue Sep 15, 2019 · 7 comments · Fixed by #3184
Closed

cam.buildlib failed on Summit with default IBM XL compiler #3183

dqwu opened this issue Sep 15, 2019 · 7 comments · Fixed by #3184

Comments

@dqwu
Copy link
Contributor

dqwu commented Sep 15, 2019

On Summit, after merging pending PR #3153 locally to fix build issue #3105, there is a new build error from cam (reproducible with F case):
ERROR: BUILD FAIL: cam.buildlib failed

The error message in atm.bldlog shows:

"/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/xl-16.1.1-3/spectrum-mpi-10.3.0.1-20190611-aqjt3jo53mogrrhcrd2iufr435azcaha/include/mpif-config.h", line 63.18: 1514-004 (S) Name given for constant with PARAMETER attribute was defined elsewhere with conflicting attributes. Name is ignored.
** parallel_mod   === End of Compilation 1 ===
1501-511  Compilation failed for file parallel_mod.F90.

Line 63 of mpif-config.h is:
parameter (MPI_MAX_PROCESSOR_NAME=256-1)

With above information we can find out that the failure is caused by PR #3094.
More specifically, by the following code block added in commit 1af9d61

  subroutine initmp_from_par(par)
...
    character(len=MPI_MAX_PROCESSOR_NAME)               :: my_name
    character(len=MPI_MAX_PROCESSOR_NAME), allocatable  :: the_names(:)
...
#ifdef _MPI
#include <mpif.h>
...

This code block has two build issues:

  1. Causes cam.buildlib to fail when mpif.h is included (mpif-config.h is included indirectly), as described above. Note, this build error is only reproducible with default IBM XL compiler on Summit (not reproducible with GNU compiler on Summit, though)
  2. When _MPI macro is not defined, definition error occurs as MPI_MAX_PROCESSOR_NAME parameter is undefined to the compiler.

A proposed fix is like below:

  subroutine initmp_from_par(par)
...
#ifdef _MPI
#include <mpif.h>
    character(len=MPI_MAX_PROCESSOR_NAME)               :: my_name
    character(len=MPI_MAX_PROCESSOR_NAME), allocatable  :: the_names(:)
...

PS, for module parallel_mod, mpif.h is already included at the beginning:

#ifdef _MPI
#include <mpif.h>
#endif

Is there a reason to include it again inside each of the 4 subroutines listed below?
subroutine init_par
subroutine initmp_from_par(par)
subroutine syncmp(par)
subroutine syncmp_comm(comm)

@dqwu
Copy link
Contributor Author

dqwu commented Sep 15, 2019

@sarats Could you please try to reproduce this build error on Summit with default XL compiler? You can use a low-res F case (--compset FC5AV1C-H01B --res ne4_ne4). Just test with latest E3SM master branch, and merge jackreeveseyre/cime/sfc_flux_bug_fix to fix a known build issue in shr_flux_mod.F90.

@bartgol
Copy link
Contributor

bartgol commented Sep 16, 2019

@dqwu This is interesting. The PR/commit you refer to simply split a function (initmp) into two functions, to allow to easily injecting homme into an app with an already existing arbitrary comm.

The inclusion of mpif.h was there before the branch got merged. As was the declaration of my_name and the_names. The only difference is that before those variables were only declared if _MPI was defined, while now they are always defined (and used only if _MPI is defined). If that's causing the problem, then I'm baffled, since there are plenty of unused vars in homme.

As for the multiple inclusion of mpif, I'm not sure. That's how it was before, so I didn't want to change it. Since the first inclusion is at module level, I think it "should" be safe to remove the other inclusions.

@bartgol
Copy link
Contributor

bartgol commented Sep 16, 2019

Also, homme has always relied on _MPI being defined/undefined to switch between serial and mpi. And we have run homme before on summit, so I'm puzzled on why this seems to be an issue now but not before...

Edit: nvm, the issue is that MPI_MAX_PROCESSOR_NAME before was used only inside an #ifdef _MPI block. Ok, now I think I understand.

I will submit a quick PR with the bugfix. In the PR I will also experiment the removal of the redundant #include <mpif.h>, so we can see if they are safe to remove.

bartgol added a commit to bartgol/E3SM that referenced this issue Sep 16, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it *must* be used only inside blocks guarded by `#ifdef _MPI`

Also, removing redundant inclusion of mpif.h.
@dqwu
Copy link
Contributor Author

dqwu commented Sep 16, 2019

@bartgol
Note that this issue is only reproducible with IBM XL compiler (no failed tests observed on E3SM CDash where the existing builds use GNU or Intel compilers).
Do you have access to Summit? If not, I can test your fix on Summit later.

@dqwu
Copy link
Contributor Author

dqwu commented Sep 16, 2019

Edit: nvm, the issue is that MPI_MAX_PROCESSOR_NAME before was used only inside an #ifdef _MPI block. Ok, now I think I understand.

MPI_MAX_PROCESSOR_NAME was used after the inclusion of mpif.h:

function initmp(npes_in,npes_stride) result(par)
...
#ifdef _MPI
#include <mpif.h>
    integer(kind=int_kind)                              :: ierr,tmp_min,tmp_max
    integer(kind=int_kind)                              :: FrameNumber
    logical :: running   ! state of MPI at beginning of initmp call
    character(len=MPI_MAX_PROCESSOR_NAME)               :: my_name
    character(len=MPI_MAX_PROCESSOR_NAME), allocatable  :: the_names(:)
...

When it is used before the inclusion of mpif.h (as in latest code), GNU compiler is OK with it but XL compiler complains "Name given for constant with PARAMETER attribute was defined elsewhere with conflicting attributes" when mpif.h is included.

@bartgol
Copy link
Contributor

bartgol commented Sep 16, 2019

@dqwu, I have a branch ready, I can issue a PR right away. Just one question: I have a fix that also allows to use Clang to test homme (with c++, it helps detecting some non-standart compliant bugs). It's a quick fix that fixes how the openmp flags were set when C and F compiler differ. Is it ok if I piggy back that fix onto this one or would you like to keep the PR contained to this issue?

@dqwu
Copy link
Contributor Author

dqwu commented Sep 16, 2019

@bartgol I prefer to have a PR that just fixes this issue. You can issue another PR that allows to use Clang to test homme.

bartgol added a commit to bartgol/E3SM that referenced this issue Sep 16, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it *must* be used only inside blocks guarded by `#ifdef _MPI`

Also, removing redundant inclusion of mpif.h.
bartgol added a commit to bartgol/E3SM that referenced this issue Sep 16, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it *must* be used only inside blocks guarded by `#ifdef _MPI`

Also, removing redundant inclusion of mpif.h.
bartgol added a commit to bartgol/E3SM that referenced this issue Sep 16, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it *must* be used only inside blocks guarded by `#ifdef _MPI`

Also, removing redundant inclusion of mpif.h.
dqwu pushed a commit that referenced this issue Sep 18, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it *must* be used only inside blocks guarded by `#ifdef _MPI`

Also, removing redundant inclusion of mpif.h.
dqwu pushed a commit that referenced this issue Sep 18, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it *must* be used only inside blocks guarded by `#ifdef _MPI`

Also, removing redundant inclusion of mpif.h.
minxu74 added a commit that referenced this issue Sep 20, 2019
)

The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it must be used only inside blocks guarded by #ifdef _MPI

Also, removing redundant inclusion of mpif.h.

Fixes #3183

* remotes/bartgol/E3SM/Fix-3183:
  Fix #3183
minxu74 added a commit that referenced this issue Sep 23, 2019
The constant MPI_MAX_PROCESSOR_NAME is only defined if mpi headers are
included, so it must be used only inside blocks guarded by #ifdef _MPI

Also, removing redundant inclusion of mpif.h.

Fixes #3183
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants