Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi_f08 configure tests failing for NAG #9795

Open
jsquyres opened this issue Dec 30, 2021 · 16 comments
Open

mpi_f08 configure tests failing for NAG #9795

jsquyres opened this issue Dec 30, 2021 · 16 comments

Comments

@jsquyres
Copy link
Member

Per this thread on the users mailing list, the configure tests for mpi_f08 are apparently failing incorrectly for the NAG compiler v7.2 in Open MPI v4.1.x:

$ FC=nagfor ./configure --prefix=/usr/local/openmpi-4.1.2
[ ... ]
checking if building Fortran 'use mpi_f08' bindings... no
Build MPI Fortran bindings: mpif.h, use mpi

That thread also cites this from config.log, which could be a clue into what is happening:

configure:69595: result: no
configure:69673: checking for Fortran compiler support of !$PRAGMA IGNORE_TKR
configure:69740: nagfor -c -f2008 -dusty -mismatch  conftest.f90 >&5
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
Evaluation trial version of NAG Fortran Compiler Release 7.1(Hanzomon)
Build 7101
Questionable: conftest.f90, line 52: Variable A set but never referenced
Warning: conftest.f90, line 52: Pointer PTR never dereferenced
Error: conftest.f90, line 39: Incorrect data type REAL (expected
CHARACTER) for argument BUFFER (no. 1) of FOO
Error: conftest.f90, line 50: Incorrect data type INTEGER (expected
CHARACTER) for argument BUFFER (no. 1) of FOO
[NAG Fortran Compiler error termination, 2 errors, 2 warnings]
configure:69740: $? = 2

Finally, a user contacted NAG tech support about this issue, and they replied with this:

Regarding OpenMPI, we have attempted the build ourselves but cannot make sense of the configure script. Only the OpenMPI maintainers can do something about that and it looks like they assume that all compilers will just swallow non-conforming Fortran code. The error downgrading options for NAG compiler remain "-dusty", "-mismatch" and "-mismatch_all" and none of them seem to help with the mpi_f08 module of OpenMPI. If there is a bug in the NAG Fortran Compiler that is responsible for this, we would love to hear about it, but at the moment we are not aware of such.

I'm labeling this as a v4.1 and v5.0 issue. I don't think it will be worthwhile to back-port relevant fixes back to v4.0.x.

FYI @ThemosTsikas @mathomp4 @wadudmiah @tkacvinsky

@ggouaillardet
Copy link
Contributor

Can we have the full (compressed) config.log?

configure tries different methods to support the ignore_tkr thing

  • as-is (e.g. no pragma/directive)
  • !GCC$ ATTRIBUTES NO_ARG_CHECK
  • !DEC$ ATTRIBUTES NO_ARG_CHECK
  • !$PRAGMA IGNORE_TKR
  • !DIR$ IGNORE_TKR
  • !IBM* IGNORE_TKR

so we should not focus on just one.
The code is in the _OMPI_FORTRAN_CHECK_IGNORE_TKR() and OMPI_FORTRAN_CHECK_IGNORE_TKR_SUB() subroutines from config/ompi_fortran_check_ignore_tkr.m4.

I think that should be enough for NAG support to determine if/how to make it work with their compiler.
If needed, I can post the 6 Fortran snippets that are tried to compile.

@mathomp4
Copy link

mathomp4 commented Dec 31, 2021 via email

@jsquyres
Copy link
Member Author

If NAG doesn't support ignoring TKR, then -- at least at the moment -- Open MPI's mpi_f08 module won't work with the NAG compiler.

Open MPI certainly can -- and should -- be extended to properly support TYPE(*), DIMENSION(..), but it hasn't been a high priority. It's been a while since I've thought about this kind of stuff, but I seem to recall that there's two possibilities for TYPE(*), DIMENSION(..) support:

  1. Have subroutine prototypes with TYPE(*), DIMENSION(..), but ignore the majority of the Fortran descriptor metadat that is passed to the subroutine and just use the buffer pointer. This would allow TYPE(*), DIMENSION(..) prototypes, but would involve the least amount of perturbation of the Open MPI code base (i.e., not have to grow full support for all the other metadata in the Fortran descriptor).
  2. Fully support Fortran descriptors, including non-contiguous buffers (i.e,. be able to set MPI_SUBARRARYS_SUPPORTED to .TRUE.). This is a non-trivial amount of work.

I should also note that, starting earlier this month/before the Christmas break, there is a low-but-nonzero level of ongoing activity to generate the Fortran bindings based on the MPI Forum JSON / Python bindings library. If someone is of the mind to properly support TYPE(*), DIMENSION(..), please come talk to me and @markalle before you go hand-write a whole bunch of code. That being said, note that this Fortran code generation work is intended for Open MPI 5.x and later (it'll likely miss the 5.0.x series, but could maybe potentially be part of a 5.1.x series...?) -- there's no intent to back-port it to Open MPI v4.x or earlier.

@wadudmiah
Copy link

Could you please attach the Fortran code that is used to determine if the compiler supports F2008? If NAG can see that it is valid Fortran, then they can investigate a bug in their compiler. The latter would not be a massive surprise as they have just released this new version (7.1) of the compiler.

@jsquyres
Copy link
Member Author

@wadudmiah There's not just one test -- there's a whole bunch of them. They are collectively used to determine whether the Fortran compiler supports "enough" F08 for Open MPI's mpi_f08 module or not. See https://github.com/open-mpi/ompi/blob/master/config/ompi_setup_mpi_fortran.m4#L411-L588 for the list of tests that all have to pass for configure to determine whether the Fortran compiler supports what is needed.

I also explained a bit more in https://www.mail-archive.com/users@lists.open-mpi.org/msg34698.html:

I'm one of the few people in the Open MPI dev community who has a clue about Fortran, and I'm very far from being a Fortran expert. Modern Fortran is a legitimately complicated language. So it doesn't surprise me that we might have some code in our configure tests that isn't quite right.

Let's also keep in mind that the state of F2008 support varies widely across compilers and versions. The current Open MPI configure tests straddle the line of trying to find enough F2008 support in a given compiler to be sufficient for the mpi_f08 module without being so overly proscriptive as to disqualify compilers that aren't fully F2008-compliant. Frankly, the state of F2008 support across the various Fortran compilers was a mess when we wrote those configure tests; we had to cobble together a variety of complicated tests to figure out if any given compiler supported enough F2008 support for some / all of the mpi_f08 module. That's why the configure tests are... complicated.

That's why it would be useful to see the config.log file from a configure invocation with the NAG compiler -- let's see exactly which test(s) is(are) failing. Then we can figure out why.

It's also possible that since NAG doesn't support "ignore TKR" pragmas, then Open MPI's configure is correctly determining that the NAG compiler doesn't support Open MPI's mpi_f08 module (since Open MPI currently requires "ignore TKR" directives). See my above comment for more detail: #9795 (comment)

@ThemosTsikas
Copy link
Contributor

ThemosTsikas commented Jan 2, 2022

Hello,

Thanks for looking into this. The config.log (edit: for OpenMPI 5.0.0rc2) is here .

It would be nice, given all the effort involved in putting these new facilities into the Fortran Standard and then implementing them in compilers, to see them actually used in the field.

@ThemosTsikas
Copy link
Contributor

ThemosTsikas commented Jan 2, 2022

I have also logged all the invocations, sources and responses by the NAG Fortran Compiler during configure here.

You will find dbgnagind1234.txt text files that contain the compiler options, dbgnagsrc1234_5.f90 text files that contain the sources for the 5th option and dbgnagcap1234.txt text files that capture the compiler's messages.

@jsquyres
Copy link
Member Author

jsquyres commented Jan 2, 2022

Thanks @ThemosTsikas. From your config.log, I think I identified 2 legitimate errors (the other errors look like cases that are supposed to fail).

  1. I don't remember the exact meaning of type(*), dimension(*) (vs. type(*), dimension(..)), but we're testing for it in config/ompi_fortran_check_ignore_tkr.m4. There was one place in there where I accidentally had type(*) instead of type(*), dimension(*) -- I think that's an error (vs. something Past Jeff did deliberately). I updated it to be type(*), dimension(*).
  2. Another place had a dummy parameter as real when I'm pretty sure it should have been complex. Fixed.

I've posted #9812 with these 2 fixes, and a tarball created from this branch here: https://aws.open-mpi.org/~jsquyres/unofficial/openmpi-gitclone-pr9812.tar.bz2. Could you give either the PR or the tarball a try?

@ThemosTsikas
Copy link
Contributor

Tried again with pr9812 tarball, config.log.

@jsquyres
Copy link
Member Author

jsquyres commented Jan 2, 2022

Thanks! Let me follow up with you on #9812...

@bwbarrett bwbarrett modified the milestones: v4.1.3, v4.1.4 Mar 31, 2022
@bwbarrett bwbarrett modified the milestones: v4.1.4, v4.1.5 May 25, 2022
@Malcolm-Cohen
Copy link

I note the erroneous
scalar TYPE(*)
instead of the standard
TYPE(*),DIMENSION(*)
is still in 4.1.4.

That is, the program following the
"checking for Fortran compiler support of TYPE(), DIMENSION()"
line (starting at line 69152 of configure) has

  interface
     subroutine foo(buffer, count)
       ! buffer
       type(*), intent(in) :: buffer
       integer, intent(in) :: count
     end subroutine foo
  end interface

i.e. note the lack of DIMENSION(*) there.

The second bug is that FORCE_ASSUMED_SHAPE has the wrong data type for its argument; there is an interface specifying COMPLEX,DIMENSION(:,:) but the procedure is defined with REAL,DIMENSION(:,:). The test program only calls this with COMPLEX arguments. Just changing (on line 69207) "real"->"complex" fixes that.

The third bug is that TYPE(*) arguments require the procedure to be referenced with an explicit interface - 15.4.2.2 item (3)(f), TYPE(*) is a kind of polymorphic (7.3.2.2 paragraph 3, first sentence). The call to FOO from FORCE_ASSUMED_SHAPE is missing that interface. This could be added to FORCE_ASSUMED_SHAPE, but frankly, deleting the interface for FORCE_ASSUMED_SHAPE in the main program and making it into an internal subprogram is the simplest fix.

@tclune
Copy link

tclune commented Sep 13, 2022

@malcom-cohen Thanks - very much looking forward to using these features with NAG compiler!

@mathomp4
Copy link

I hate to revive a zombit issue, but I was wondering if support for this got into v5.0.X say? (Just because I see it in the labels).

We'd love to move to use mpi_f08 (though in fairness we have some mpif.h that needs to go first. Going bye bye in MPI 5!)

@bwbarrett bwbarrett modified the milestones: v4.1.6, v4.1.7 Sep 30, 2023
@mathomp4
Copy link

I hate to revive a zombit issue, but I was wondering if support for this got into v5.0.X say? (Just because I see it in the labels).

Just to say it, but can confirm that mpi_f08 does not build for NAG in Open MPI 5.0.1. Just looked at my build on my Mac.

@jsquyres
Copy link
Member Author

jsquyres commented Jan 30, 2024

Yeah, I'm sorry -- I think we did a bunch of work on it back then, but then ran out of time / resources, and not enough people were asking for mpi_f08 support with the NAG Fortran compiler. ☹️

Is there renewed interest? There's two possible approaches to fix this:

  1. Fix what is there already (i.e., fix the implementation, and then update the configure tests to match what they really need to test).
  2. Auto-generate the Fortran MPI API bindings (even if it's just generate the mpi_f08 bindings), and do them correctly.

We've talked about auto-generating for quite a while (both C and the various Fortran bindings), but haven't gotten past the do-a-bunch-of-first-steps phase because this is unfortunately a ton of work. That being said, #12226 is steps in the right direction -- albeit for the MPI-4 embiggening -- but it could open the door for someone who has some time to build upon it for the other bindings.

@Malcolm-Cohen
Copy link

Malcolm-Cohen commented Jan 31, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants