Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortran mpi module: remove interface names #9445

Conversation

jsquyres
Copy link
Member

Fortran is a deceptively complicated language. It turns out that it
is permissable to pass arrays of many sizes and shapes to subroutines
that expect 1-dimensional array parameters. For example, a C
programmer might look at the following subroutine declaration and
assume that it means that the ARRAY_OF_REQUESTS parameter is a
1-dimensional array of any length:

MPI_WAITALL(COUNT, ARRAY_OF_REQUESTS, ARRAY_OF_STATUSES, IERROR)
    INTEGER COUNT, ARRAY_OF_REQUESTS(*),
              ARRAY_OF_STATUSES(MPI_STATUS_SIZE, *), IERROR

However, rules of the Fortran language allow the compiler to
effectively morph non-1-dimensional array source parameters to match,
for example, the 1-dimensional ARRAY_OF_REQUESTS dummy parameter.
Sometimes this will be done by copy-in-copy-out; other times, the
compiler can just swizzle the types to match because the source array
data is already contiguous in memory.

Regardless, there are several rules about when this type swizzling can
happen, and there are differences in the rules between when the target
subroutine in question is in a named interface such as:

INTERFACE FOO
  SUBROUTINE FOO(arg)
    INTEGER ARG(*)
  END SUBROUTINE FOO
END INTERFACE

versus when the subroutine is in an unnamed interface such as:

INTERFACE
  SUBROUTINE FOO(arg)
    INTEGER ARG(*)
  END SUBROUTINE FOO
END INTERFACE

I will not pretend to be a Fortran language expert and won't quote the
language rules here. The short version is that unnamed interfaces
allow more flexible type swizzling. For example, a user can pass a
2-dimensional INTEGER array to MPI_WAITALL's ARRAY_OF_REQUESTS
argument, and -- assuming that the 2D array is contiguous in memory --
the compiler will make the 2D array type match the 1D
ARRAY_OF_REQUESTS argument. Open MPI will get a contiguous chunk of
memory of integers, so it doesn't know/care.

The MPI standard does not address this issue at all. I.e., it doesn't
say whether passing N-dimensional parameters to subroutines like
MPI_WAITALL are permissable or not. It also does not specify whether
interfaces need to be used, or if interfaces are used, whether they
need to be named or unnamed -- these are all implementation decisions.
However, there has been a lot of discussion about this in the Fortran
group at the MPI Forum over the last week; there will almost certainly
be an errata to MPI-4.0 and/or some text changes in MPI-4.1 to clarify
the situation. I'm not going to speculate on the final language that
will get accepted by the Forum, but it seems like since Open MPI's
implentation decision to use named interfaces is perhaps the most
restrictive in terms of compiler swizzling functionality, we should
probably ease those restrictions so as not to violate the Law of Least
Astonishment from the user's perspective (who would expect that the
Fortran compiler will be able to swizzle array sizes and shapes).

As such, this commit removes all interface names from the Fortran
"mpi" bindings module where there's only a single subroutine in the
interface, and that subroutine name exactly matches the interface
name. Note that this specifically excludes the following routines,
which have multiple subroutines in the interface (for two different
types of Fortran pointers):

  • MPI_Alloc_mem
  • MPI_Win_allocate
  • MPI_Win_allocate_shared
  • MPI_Win_shared_query

Deleting the interface names may cause ABI implications in some
compilers. I have not exhaustively tested Fortran compilers to know
if this is an issue or not. The safest thing to do is to implement
this change for Open MPI v5.0.0 where we're breaking lots of ABI
things as compared to prior versions of Open MPI.

Note that this issue does not affect the mpi_f08 module. Using
named interfaces were distinctly specified in the MPI standard.

Also, this doesn't affect mpif.h, because Open MPI doesn't declare
most interfaces/subroutines in mpif.h.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

Fortran is a deceptively complicated language.  It turns out that it
is permissable to pass arrays of many sizes and shapes to subroutines
that expect 1-dimensional array parameters.  For example, a C
programmer might look at the following subroutine declaration and
assume that it means that the ARRAY_OF_REQUESTS parameter is a
1-dimensional array of any length:

```
MPI_WAITALL(COUNT, ARRAY_OF_REQUESTS, ARRAY_OF_STATUSES, IERROR)
    INTEGER COUNT, ARRAY_OF_REQUESTS(*),
              ARRAY_OF_STATUSES(MPI_STATUS_SIZE, *), IERROR
```

However, rules of the Fortran language allow the compiler to
effectively morph non-1-dimensional array source parameters to match,
for example, the 1-dimensional ARRAY_OF_REQUESTS dummy parameter.
Sometimes this will be done by copy-in-copy-out; other times, the
compiler can just swizzle the types to match because the source array
data is already contiguous in memory.

Regardless, there are several rules about when this type swizzling can
happen, and there are differences in the rules between when the target
subroutine in question is in a named interface such as:

```
INTERFACE FOO
  SUBROUTINE FOO(arg)
    INTEGER ARG(*)
  END SUBROUTINE FOO
END INTERFACE
```

versus when the subroutine is in an unnamed interface such as:

```
INTERFACE
  SUBROUTINE FOO(arg)
    INTEGER ARG(*)
  END SUBROUTINE FOO
END INTERFACE
```

I will not pretend to be a Fortran language expert and won't quote the
language rules here.  The short version is that unnamed interfaces
allow more flexible type swizzling.  For example, a user can pass a
2-dimensional INTEGER array to MPI_WAITALL's ARRAY_OF_REQUESTS
argument, and -- assuming that the 2D array is contiguous in memory --
the compiler will make the 2D array type match the 1D
ARRAY_OF_REQUESTS argument.  Open MPI will get a contiguous chunk of
memory of integers, so it doesn't know/care.

The MPI standard does not address this issue at all.  I.e., it doesn't
say whether passing N-dimensional parameters to subroutines like
MPI_WAITALL are permissable or not.  It also does not specify whether
interfaces need to be used, or if interfaces are used, whether they
need to be named or unnamed -- these are all implementation decisions.
However, there has been a lot of discussion about this in the Fortran
group at the MPI Forum over the last week; there will almost certainly
be an errata to MPI-4.0 and/or some text changes in MPI-4.1 to clarify
the situation.  I'm not going to speculate on the final language that
will get accepted by the Forum, but it seems like since Open MPI's
implentation decision to use named interfaces is perhaps the most
restrictive in terms of compiler swizzling functionality, we should
probably ease those restrictions so as not to violate the Law of Least
Astonishment from the user's perspective (who would expect that the
Fortran compiler will be able to swizzle array sizes and shapes).

As such, this commit removes all interface names from the Fortran
"mpi" bindings module where there's only a single subroutine in the
interface, and that subroutine name exactly matches the interface
name.  Note that this specifically excludes the following routines,
which have multiple subroutines in the interface (for two different
types of Fortran pointers):

* MPI_Alloc_mem
* MPI_Win_allocate
* MPI_Win_allocate_shared
* MPI_Win_shared_query

Deleting the interface names *may* cause ABI implications in some
compilers.  I have not exhaustively tested Fortran compilers to know
if this is an issue or not.  The safest thing to do is to implement
this change for Open MPI v5.0.0 where we're breaking lots of ABI
things as compared to prior versions of Open MPI.

Note that this issue does *not* affect the mpi_f08 module.  Using
named interfaces were distinctly specified in the MPI standard.

Also, this doesn't affect mpif.h, because Open MPI doesn't declare
most interfaces/subroutines in mpif.h.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres
Copy link
Member Author

@markalle @awlauria This PR should replace #9259 and #9367.

I marked this as a blocker because it may have ABI implications, and we therefore need to get this in before 5.0.0.

Also, note that we still need to fix the places where we accidentally specify array lengths (e.g., MPI_WAITALL) where they should be (*); I did not include that in this PR.

These subroutines are no longer referenced in the "mpi" module
interface; remove them.

I *believe* that these subroutines may have been kept for ABI /
backwards linking compatibility at one point.  But for Open MPI
v5.0.0, we're breaking ABI, so these old symbols do not need to be
maintained any more.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2021

@awlauria @markalle While working on this yesterday, I noticed that we had some stale symbols left around in the use-mpi-tkr implementation; I just pushed up an additional commit to remove them.

@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2021

The OMPI community jenkins build is in a weird state (it never came up with a "Details" link). Let's try triggering it again...

@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2021

bot:ompi:retest

@markalle
Copy link
Contributor

markalle commented Oct 1, 2021

Looks fantastic.

I found one place in the standard (17.1.5) where it gives an example

MODULE mpi
  INTERFACE MPI_Comm_rank  ! (as defined in Chapter 6)
    SUBROUTINE MPI_Comm_rank(comm, rank, ierror)
      INTEGER, INTENT(IN)  :: comm   ! The INTENT may be added although
      INTEGER, INTENT(OUT) :: rank   ! it is not defined in the
      INTEGER, INTENT(OUT) :: ierror ! official routine definition.
    END SUBROUTINE
  END INTERFACE
END MODULE mpi

that supports the idea that they're using the generic interface and we thus don't have to support the fancy argument matching, but I still like this PR's approach of being conservative and assuming we should match it (least astonishment, and the standard isn't totally clear) and especially since it looks like an easy change. I just tested the same strategy in my toy program and it worked for XLF and gfortran for me.

But I'm not following the ABI concern. An MPI call into MPI_Isend from either use mpi or include 'mpif.h' is implemented in C at ompi/mpi/fortran/mpif-h/isend_f.c. For non-F08 that's the only place any ABI comes from in OMPI libraries, right? The mpi.mod is binary and could be changed by this PR, but that's only used at compile time not run time, and couldn't be changing how the app calls MPI_Isend() since that's still going through the C definition which is untouched in this PR.

@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2021

I found one place in the standard (17.1.5) where it gives an example

Yeah, I talked about this example with Rolf R. I'm quite sure this example is why Past Jeff initially implemented the mpi module with named interfaces.

But at the end of the day, that code snipit is in an advice to implementors, and therefore is non-normative. I.e., it's not mandating that we have to use named interfaces for the mpi module.

that supports the idea that they're using the generic interface and we thus don't have to support the fancy argument matching, but I still like this PR's approach of being conservative and assuming we should match it (least astonishment, and the standard isn't totally clear) and especially since it looks like an easy change. I just tested the same strategy in my toy program and it worked for XLF and gfortran for me.

👍

But I'm not following the ABI concern. An MPI call into MPI_Isend from either use mpi or include 'mpif.h' is implemented in C at ompi/mpi/fortran/mpif-h/isend_f.c. For non-F08 that's the only place any ABI comes from in OMPI libraries, right? The mpi.mod is binary and could be changed by this PR, but that's only used at compile time not run time, and couldn't be changing how the app calls MPI_Isend() since that's still going through the C definition which is untouched in this PR.

Oh yes, I think you're right: the back-end symbol is the C code in ompi/mpi/fortran/mpif-h/blah_f.c; I forgot about that. So I think you're right: the run-time linker symbol won't change, and this particular issue won't cause an ABI break.

The 2nd commit that I added to this PR, though (removing the legacy symbols, which I think were there solely for preserving ABI with a prior change), definitely removes some symbols and breaks the ABI, though. 😈

@markalle
Copy link
Contributor

markalle commented Oct 1, 2021

Yeah, I saw a few symbols going away and wasn't sure what they were, but figured maybe they were old extensions or something. So yeah, removing symbols I agree is changing the ABI

@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2021

@markalle For reference, see the commit message on the 2nd commit for an explanation (unless I rebase, it's 9865f3a).

@jsquyres jsquyres merged commit 050713d into open-mpi:master Oct 1, 2021
@jsquyres jsquyres deleted the pr/i-dont-think-you-know-what-a-fortran-array-really-is branch October 1, 2021 18:42
skosukhin added a commit to skosukhin/ompi that referenced this pull request Aug 21, 2023
This fixes the inconsistency between what is tested at the configure time (the
unnamed Fortran interface) and what is actually used in the source code (the
named Fortran interface).

As was mentioned in open-mpi#9445, the unnamed interfaces are more forgiving when it
comes to TKR mismatches. Therefore, the current configure-time check is not
strict enough and is prone to giving false results (e.g. for NVHPC 23.3+). The
fix is to switch to the named Fortran interface in the configure script.

Note that the inconsistency was resolved in the main branch with open-mpi#9445, which we
cannot cherry-pick because it potentially breaks the ABI compatibility.

Signed-off-by: Sergey Kosukhin <sergey.kosukhin@mpimet.mpg.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants