Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shm: MPI_Win_create + MPI_Win_shared_query does not work #6898

Open
maximilian-tech opened this issue Feb 7, 2024 · 3 comments
Open

shm: MPI_Win_create + MPI_Win_shared_query does not work #6898

maximilian-tech opened this issue Feb 7, 2024 · 3 comments
Assignees

Comments

@maximilian-tech
Copy link

Version: mpich v4.2.0rc3

I want to access a memory window on rank 0 created win MPI_Win_create with MPI_Win_shared_query on rank 1.

It does not work as expected (see code appended).

Instead of getting a base pointer of rank 0 (via MPI_Win_shared_query(win, 0, &ssize, &disp_unit, &baseptr); ), I get the base pointer of my own window on rank 1 together with the appropriate size if my window.

The absence of a shared memory windows seems to be standard conform, however unexpected.
But the size should be 0, not the size of my local window, correct?

2 questions arise:

  1. Is the current implementation MPI conform?
  2. Will MPI_Win_create be able to create shared memory windows accessible with MPI_Win_shared_query?

Maybe I did something wrong installation wise or code wise, I am thankful for any comment!


The code that returns MPI_Win_shared_query in mpidig_win.h

    /* When only single process exists on the node or shared memory allocation fails,
     * should only query MPI_PROC_NULL or local process. Thus, return local window's info. */
    if (win->comm_ptr->node_comm == NULL || !shared_table) {
        *size = win->size;
        *disp_unit = win->disp_unit;
        *((void **) baseptr) = win->base;
        goto fn_exit;
    }

The change log for mpich-v4.2.0rc3 states:

# MPI_Win_shared_query can be used on windows created by MPI_Win_create,
  MPI_Win_allocate, in addition to windows created by MPI_Win_allocate_shared.
  MPI_Win_allocate will create shared memory whenever feasible, including between
  spawned processes on the same node.

The MPIv4.1 standard states:

MPI_Win_shared_query( )
...
Only MPI_WIN_ALLOCATE_SHARED is guaranteed to allocate shared memory. Im-
plementations are permitted, where possible, to provide shared memory for windows cre-
ated with MPI_WIN_CREATE and MPI_WIN_ALLOCATE. However, availability of shared
memory is not guaranteed. When the remote memory segment corresponding to a par-
ticular process cannot be accessed directly, this call returns size = 0 and a baseptr as if
MPI_ALLOC_MEM was called with size = 0.

...

_Advice to users._ For windows allocated using MPI_WIN_ALLOCATE or
MPI_WIN_CREATE, the group of MPI processes for which the implementation may
provide shared memory can be determined using MPI_COMM_SPLIT_TYPE described
in Section 7.4.2. (End of advice to users.)
$ mpirun --version
HYDRA build details:
    Version:                                 4.2.0rc3
    Release Date:                            Tue Jan 30 10:03:39 CST 2024
    CC:                              gcc      
    Configure options:                       '--disable-option-checking' '--prefix=/home/max/mpich_4.2.0rc3' '--cache-file=/dev/null' '--srcdir=../../../../src/pm/hydra' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -DNETMOD_INLINE=__netmod_inline_ofi__ -I/home/max/tests/mpich-4.2.0rc3/_build/src/mpl/include -I/home/max/tests/mpich-4.2.0rc3/src/mpl/include -I/home/max/tests/mpich-4.2.0rc3/modules/json-c -I/home/max/tests/mpich-4.2.0rc3/_build/modules/json-c -D_REENTRANT -I/home/max/tests/mpich-4.2.0rc3/_build/src/mpi/romio/include -I/home/max/tests/mpich-4.2.0rc3/src/pmi/include -I/home/max/tests/mpich-4.2.0rc3/_build/src/pmi/include -I/home/max/tests/mpich-4.2.0rc3/_build/modules/yaksa/src/frontend/include -I/home/max/tests/mpich-4.2.0rc3/modules/yaksa/src/frontend/include -I/home/max/tests/mpich-4.2.0rc3/_build/modules/libfabric/include -I/home/max/tests/mpich-4.2.0rc3/modules/libfabric/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select

Example Code

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <assert.h>

#define SIZE 10

int main(int argc, char *argv[]) {

  MPI_Init(&argc, &argv);

  int rank, size;

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  
  if (rank == 0) {
      size = SIZE;
  } else {
      size = SIZE;
  }
  // Create shared communicator
  MPI_Comm shared_comm;
  MPI_Comm_split_type(MPI_COMM_WORLD,
                      MPI_COMM_TYPE_SHARED,
                      0,
                      MPI_INFO_NULL,
                      &shared_comm);
  
  float* data;                        
  data = (float *)calloc(size, sizeof(float));

  // Initialize array on rank 0
  if ( rank == 0 ){
      for (int i = 0; i < SIZE; i++) {
          data[i] = i;
      }
  } else {
      for (int i = 0; i < SIZE; i++) {
          data[i] = i+SIZE;
      }
  }
  printf("Rank: %i, Data on rank 0 before modification: \n", rank);
  for (int i = 0; i < SIZE; i++) {
      printf("%.2f ", data[i]);
  }
  printf("\n");
  fflush(stdout);
  
  // Create Windows
  MPI_Win win;
  int err = MPI_Win_create(&data[0],
                           size * sizeof(float),
                           sizeof(float),
                           MPI_INFO_NULL,
                           shared_comm,
                           &win);

  assert(err  == MPI_SUCCESS);
  MPI_Win_fence(0, win);

  float *baseptr;
  if (rank != 0) {
      // Use MPI_Win_shared_query
      int disp_unit;
      MPI_Aint ssize;
      
      MPI_Win_shared_query(win, 0, &ssize, &disp_unit, &baseptr);
      assert(disp_unit > 0);
      assert(ssize > 0);
      
      // Access shared data on non-zero ranks after querying
      printf("Data on rank %d: \n", rank);
      for (int i = 0; i < SIZE; i++) {
          printf("%.2f ", baseptr[i]);
      }
      printf("\n");
      fflush(stdout);

      float my_val = 123.9;
      baseptr[3] = my_val; // Modify data through shared base pointer
      printf("Modify data through shared base pointer: 'baseptr[3] = %f'\n", my_val);
      fflush(stdout);
      
      printf("Data on rank %d after modification: \n", rank);
      for (int i = 0; i < SIZE; i++) {
          printf("%.2f ", data[i]);
      }
      printf("\n");
      fflush(stdout);
  }

  MPI_Win_fence(0, win);  
  MPI_Barrier(shared_comm);
  MPI_Win_fence(0, win);

  // Print data on rank 0 after modification
  if (rank == 0) {
      printf("Data on rank 0 after modification: \n");
      for (int i = 0; i < SIZE; i++) {
          printf("%.2f ", data[i]);
      }
      printf("\n");
      fflush(stdout);
  }
  MPI_Win_fence(0, win);

  MPI_Win_free(&win);
  free(data); // Free dynamically allocated memory

  MPI_Finalize();
  return 0;

}

@hzhou
Copy link
Contributor

hzhou commented Feb 7, 2024

I think it is a bug. It should return size 0 to tell you that the memory is not shared.

@maximilian-tech
Copy link
Author

Thank you for your quick reply!

I would like to linger on the question if MPI_Win_create will be able to create shared memory windows accessible with MPI_Win_shared_query? It currently does not look like it will be supported soon(?).

Would it be possible to highlight the fact that MPI_Win_create does not make memory available to MPI_Win_shared_query, and that these function call will always return size = 0, except when query the own window?

This could be done by adding a small sentence to the change log. This explicit mentioning of the fact would be very helpful!

Thanks.

@hzhou
Copy link
Contributor

hzhou commented Feb 9, 2024

You are correct that in the current release MPI_Win_create does not make the memory accessible to other processes even when they are in the same shared domain. But it seems plausible with kernel modules such as CMA and XPMEM, we could expose the memory to each other. So stay tuned.

@hzhou hzhou self-assigned this Feb 9, 2024
@hzhou hzhou changed the title Conformity + Clarification: MPI_Win_create + MPI_Win_shared_query does not work shm: MPI_Win_create + MPI_Win_shared_query does not work Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants