Fortran mpi module: remove interface names #9445

jsquyres · 2021-09-30T15:29:41Z

Fortran is a deceptively complicated language. It turns out that it
is permissable to pass arrays of many sizes and shapes to subroutines
that expect 1-dimensional array parameters. For example, a C
programmer might look at the following subroutine declaration and
assume that it means that the ARRAY_OF_REQUESTS parameter is a
1-dimensional array of any length:

MPI_WAITALL(COUNT, ARRAY_OF_REQUESTS, ARRAY_OF_STATUSES, IERROR)
    INTEGER COUNT, ARRAY_OF_REQUESTS(*),
              ARRAY_OF_STATUSES(MPI_STATUS_SIZE, *), IERROR

However, rules of the Fortran language allow the compiler to
effectively morph non-1-dimensional array source parameters to match,
for example, the 1-dimensional ARRAY_OF_REQUESTS dummy parameter.
Sometimes this will be done by copy-in-copy-out; other times, the
compiler can just swizzle the types to match because the source array
data is already contiguous in memory.

Regardless, there are several rules about when this type swizzling can
happen, and there are differences in the rules between when the target
subroutine in question is in a named interface such as:

INTERFACE FOO
  SUBROUTINE FOO(arg)
    INTEGER ARG(*)
  END SUBROUTINE FOO
END INTERFACE

versus when the subroutine is in an unnamed interface such as:

INTERFACE
  SUBROUTINE FOO(arg)
    INTEGER ARG(*)
  END SUBROUTINE FOO
END INTERFACE

I will not pretend to be a Fortran language expert and won't quote the
language rules here. The short version is that unnamed interfaces
allow more flexible type swizzling. For example, a user can pass a
2-dimensional INTEGER array to MPI_WAITALL's ARRAY_OF_REQUESTS
argument, and -- assuming that the 2D array is contiguous in memory --
the compiler will make the 2D array type match the 1D
ARRAY_OF_REQUESTS argument. Open MPI will get a contiguous chunk of
memory of integers, so it doesn't know/care.

The MPI standard does not address this issue at all. I.e., it doesn't
say whether passing N-dimensional parameters to subroutines like
MPI_WAITALL are permissable or not. It also does not specify whether
interfaces need to be used, or if interfaces are used, whether they
need to be named or unnamed -- these are all implementation decisions.
However, there has been a lot of discussion about this in the Fortran
group at the MPI Forum over the last week; there will almost certainly
be an errata to MPI-4.0 and/or some text changes in MPI-4.1 to clarify
the situation. I'm not going to speculate on the final language that
will get accepted by the Forum, but it seems like since Open MPI's
implentation decision to use named interfaces is perhaps the most
restrictive in terms of compiler swizzling functionality, we should
probably ease those restrictions so as not to violate the Law of Least
Astonishment from the user's perspective (who would expect that the
Fortran compiler will be able to swizzle array sizes and shapes).

As such, this commit removes all interface names from the Fortran
"mpi" bindings module where there's only a single subroutine in the
interface, and that subroutine name exactly matches the interface
name. Note that this specifically excludes the following routines,
which have multiple subroutines in the interface (for two different
types of Fortran pointers):

MPI_Alloc_mem
MPI_Win_allocate
MPI_Win_allocate_shared
MPI_Win_shared_query

Deleting the interface names may cause ABI implications in some
compilers. I have not exhaustively tested Fortran compilers to know
if this is an issue or not. The safest thing to do is to implement
this change for Open MPI v5.0.0 where we're breaking lots of ABI
things as compared to prior versions of Open MPI.

Note that this issue does not affect the mpi_f08 module. Using
named interfaces were distinctly specified in the MPI standard.

Also, this doesn't affect mpif.h, because Open MPI doesn't declare
most interfaces/subroutines in mpif.h.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

Fortran is a deceptively complicated language. It turns out that it is permissable to pass arrays of many sizes and shapes to subroutines that expect 1-dimensional array parameters. For example, a C programmer might look at the following subroutine declaration and assume that it means that the ARRAY_OF_REQUESTS parameter is a 1-dimensional array of any length: ``` MPI_WAITALL(COUNT, ARRAY_OF_REQUESTS, ARRAY_OF_STATUSES, IERROR) INTEGER COUNT, ARRAY_OF_REQUESTS(*), ARRAY_OF_STATUSES(MPI_STATUS_SIZE, *), IERROR ``` However, rules of the Fortran language allow the compiler to effectively morph non-1-dimensional array source parameters to match, for example, the 1-dimensional ARRAY_OF_REQUESTS dummy parameter. Sometimes this will be done by copy-in-copy-out; other times, the compiler can just swizzle the types to match because the source array data is already contiguous in memory. Regardless, there are several rules about when this type swizzling can happen, and there are differences in the rules between when the target subroutine in question is in a named interface such as: ``` INTERFACE FOO SUBROUTINE FOO(arg) INTEGER ARG(*) END SUBROUTINE FOO END INTERFACE ``` versus when the subroutine is in an unnamed interface such as: ``` INTERFACE SUBROUTINE FOO(arg) INTEGER ARG(*) END SUBROUTINE FOO END INTERFACE ``` I will not pretend to be a Fortran language expert and won't quote the language rules here. The short version is that unnamed interfaces allow more flexible type swizzling. For example, a user can pass a 2-dimensional INTEGER array to MPI_WAITALL's ARRAY_OF_REQUESTS argument, and -- assuming that the 2D array is contiguous in memory -- the compiler will make the 2D array type match the 1D ARRAY_OF_REQUESTS argument. Open MPI will get a contiguous chunk of memory of integers, so it doesn't know/care. The MPI standard does not address this issue at all. I.e., it doesn't say whether passing N-dimensional parameters to subroutines like MPI_WAITALL are permissable or not. It also does not specify whether interfaces need to be used, or if interfaces are used, whether they need to be named or unnamed -- these are all implementation decisions. However, there has been a lot of discussion about this in the Fortran group at the MPI Forum over the last week; there will almost certainly be an errata to MPI-4.0 and/or some text changes in MPI-4.1 to clarify the situation. I'm not going to speculate on the final language that will get accepted by the Forum, but it seems like since Open MPI's implentation decision to use named interfaces is perhaps the most restrictive in terms of compiler swizzling functionality, we should probably ease those restrictions so as not to violate the Law of Least Astonishment from the user's perspective (who would expect that the Fortran compiler will be able to swizzle array sizes and shapes). As such, this commit removes all interface names from the Fortran "mpi" bindings module where there's only a single subroutine in the interface, and that subroutine name exactly matches the interface name. Note that this specifically excludes the following routines, which have multiple subroutines in the interface (for two different types of Fortran pointers): * MPI_Alloc_mem * MPI_Win_allocate * MPI_Win_allocate_shared * MPI_Win_shared_query Deleting the interface names *may* cause ABI implications in some compilers. I have not exhaustively tested Fortran compilers to know if this is an issue or not. The safest thing to do is to implement this change for Open MPI v5.0.0 where we're breaking lots of ABI things as compared to prior versions of Open MPI. Note that this issue does *not* affect the mpi_f08 module. Using named interfaces were distinctly specified in the MPI standard. Also, this doesn't affect mpif.h, because Open MPI doesn't declare most interfaces/subroutines in mpif.h. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

jsquyres · 2021-09-30T15:32:52Z

@markalle @awlauria This PR should replace #9259 and #9367.

I marked this as a blocker because it may have ABI implications, and we therefore need to get this in before 5.0.0.

Also, note that we still need to fix the places where we accidentally specify array lengths (e.g., MPI_WAITALL) where they should be (*); I did not include that in this PR.

These subroutines are no longer referenced in the "mpi" module interface; remove them. I *believe* that these subroutines may have been kept for ABI / backwards linking compatibility at one point. But for Open MPI v5.0.0, we're breaking ABI, so these old symbols do not need to be maintained any more. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

jsquyres · 2021-10-01T12:29:30Z

@awlauria @markalle While working on this yesterday, I noticed that we had some stale symbols left around in the use-mpi-tkr implementation; I just pushed up an additional commit to remove them.

jsquyres · 2021-10-01T14:32:50Z

The OMPI community jenkins build is in a weird state (it never came up with a "Details" link). Let's try triggering it again...

jsquyres · 2021-10-01T14:32:56Z

bot:ompi:retest

markalle · 2021-10-01T18:09:11Z

Looks fantastic.

I found one place in the standard (17.1.5) where it gives an example

MODULE mpi
  INTERFACE MPI_Comm_rank  ! (as defined in Chapter 6)
    SUBROUTINE MPI_Comm_rank(comm, rank, ierror)
      INTEGER, INTENT(IN)  :: comm   ! The INTENT may be added although
      INTEGER, INTENT(OUT) :: rank   ! it is not defined in the
      INTEGER, INTENT(OUT) :: ierror ! official routine definition.
    END SUBROUTINE
  END INTERFACE
END MODULE mpi

that supports the idea that they're using the generic interface and we thus don't have to support the fancy argument matching, but I still like this PR's approach of being conservative and assuming we should match it (least astonishment, and the standard isn't totally clear) and especially since it looks like an easy change. I just tested the same strategy in my toy program and it worked for XLF and gfortran for me.

But I'm not following the ABI concern. An MPI call into MPI_Isend from either use mpi or include 'mpif.h' is implemented in C at ompi/mpi/fortran/mpif-h/isend_f.c. For non-F08 that's the only place any ABI comes from in OMPI libraries, right? The mpi.mod is binary and could be changed by this PR, but that's only used at compile time not run time, and couldn't be changing how the app calls MPI_Isend() since that's still going through the C definition which is untouched in this PR.

jsquyres · 2021-10-01T18:24:14Z

I found one place in the standard (17.1.5) where it gives an example

Yeah, I talked about this example with Rolf R. I'm quite sure this example is why Past Jeff initially implemented the mpi module with named interfaces.

But at the end of the day, that code snipit is in an advice to implementors, and therefore is non-normative. I.e., it's not mandating that we have to use named interfaces for the mpi module.

that supports the idea that they're using the generic interface and we thus don't have to support the fancy argument matching, but I still like this PR's approach of being conservative and assuming we should match it (least astonishment, and the standard isn't totally clear) and especially since it looks like an easy change. I just tested the same strategy in my toy program and it worked for XLF and gfortran for me.

👍

But I'm not following the ABI concern. An MPI call into MPI_Isend from either use mpi or include 'mpif.h' is implemented in C at ompi/mpi/fortran/mpif-h/isend_f.c. For non-F08 that's the only place any ABI comes from in OMPI libraries, right? The mpi.mod is binary and could be changed by this PR, but that's only used at compile time not run time, and couldn't be changing how the app calls MPI_Isend() since that's still going through the C definition which is untouched in this PR.

Oh yes, I think you're right: the back-end symbol is the C code in ompi/mpi/fortran/mpif-h/blah_f.c; I forgot about that. So I think you're right: the run-time linker symbol won't change, and this particular issue won't cause an ABI break.

The 2nd commit that I added to this PR, though (removing the legacy symbols, which I think were there solely for preserving ABI with a prior change), definitely removes some symbols and breaks the ABI, though. 😈

markalle · 2021-10-01T18:31:46Z

Yeah, I saw a few symbols going away and wasn't sure what they were, but figured maybe they were old extensions or something. So yeah, removing symbols I agree is changing the ABI

jsquyres · 2021-10-01T18:41:40Z

@markalle For reference, see the commit message on the 2nd commit for an explanation (unless I rebase, it's 9865f3a).

This fixes the inconsistency between what is tested at the configure time (the unnamed Fortran interface) and what is actually used in the source code (the named Fortran interface). As was mentioned in open-mpi#9445, the unnamed interfaces are more forgiving when it comes to TKR mismatches. Therefore, the current configure-time check is not strict enough and is prone to giving false results (e.g. for NVHPC 23.3+). The fix is to switch to the named Fortran interface in the configure script. Note that the inconsistency was resolved in the main branch with open-mpi#9445, which we cannot cherry-pick because it potentially breaks the ABI compatibility. Signed-off-by: Sergey Kosukhin <sergey.kosukhin@mpimet.mpg.de>

jsquyres added fortran bindings Target: main labels Sep 30, 2021

jsquyres added this to the v5.0.0 milestone Sep 30, 2021

jsquyres requested review from ggouaillardet and markalle September 30, 2021 15:29

jsquyres added the Severity: blocker label Sep 30, 2021

jsquyres added the 😳 Backward compat break label Oct 1, 2021

markalle approved these changes Oct 1, 2021

View reviewed changes

jsquyres merged commit 050713d into open-mpi:master Oct 1, 2021

jsquyres deleted the pr/i-dont-think-you-know-what-a-fortran-array-really-is branch October 1, 2021 18:42

skosukhin mentioned this pull request Aug 15, 2023

v4.1.x: Fix inconsistent configure-time check #11857

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fortran mpi module: remove interface names #9445

Fortran mpi module: remove interface names #9445

Uh oh!

jsquyres commented Sep 30, 2021

Uh oh!

jsquyres commented Sep 30, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

markalle commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021 •

edited

Loading

Uh oh!

markalle commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

Uh oh!

Fortran mpi module: remove interface names #9445

Fortran mpi module: remove interface names #9445

Uh oh!

Conversation

jsquyres commented Sep 30, 2021

Uh oh!

jsquyres commented Sep 30, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

markalle commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markalle commented Oct 1, 2021

Uh oh!

jsquyres commented Oct 1, 2021

Uh oh!

Uh oh!

jsquyres commented Oct 1, 2021 •

edited

Loading