Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic variable shaping not working as expected with MPI #3709

Closed
haykh opened this issue Jul 24, 2023 · 14 comments
Closed

Dynamic variable shaping not working as expected with MPI #3709

haykh opened this issue Jul 24, 2023 · 14 comments

Comments

@haykh
Copy link

haykh commented Jul 24, 2023

To Reproduce
Here's the minimum reproducible example.

Write three vectors `v0` (implicit fixed size), `v1` (explicit fixed size + offsets) and `v2` (variable size).
#include <adios2.h>

#include <iostream>
#include <vector>

int main(int argc, char* argv[]) {
#ifndef ADIOS2_USE_MPI
  throw std::runtime_error("This example requires MPI");
#endif

  int rank  = 0;
  int nproc = 1;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &nproc);
  const int NSTEPS = 5;
  srand(rank * 32767);

  adios2::ADIOS       adios(MPI_COMM_WORLD);

  const size_t        Nglobal = 6;
  std::vector<double> v0(Nglobal);

  unsigned int        Nelems;
  std::vector<double> v2;

  try {
    adios2::IO io = adios.DeclareIO("Output");
    io.SetEngine("HDF5");
    io.SetParameters({
      {"verbose", "4"}
    });

    adios2::Variable<double> varV0 = io.DefineVariable<double>("v0", {}, {}, { Nglobal });
    adios2::Variable<double> varV1
      = io.DefineVariable<double>("v1", { nproc * Nglobal }, { rank * Nglobal }, { Nglobal });
    adios2::Variable<double> varV2
      = io.DefineVariable<double>("v2", {}, {}, { adios2::UnknownDim });

    adios2::Engine writer = io.Open("localArray.h5", adios2::Mode::Write);

    for (int step = 0; step < NSTEPS; step++) {
      writer.BeginStep();
      for (size_t i = 0; i < Nglobal; i++) {
        v0[i] = rank * 1.0 + step * 0.1;
      }
      writer.Put<double>(varV0, v0.data());
      writer.Put<double>(varV1, v0.data());

      Nelems = rand() % 6 + 5;
      v2.reserve(Nelems);
      for (size_t i = 0; i < Nelems; i++) {
        v2[i] = rank * 1.0 + step * 0.1;
      }
      varV2.SetSelection(adios2::Box<adios2::Dims>({}, { Nelems }));
      writer.Put<double>(varV2, v2.data());

      writer.EndStep();
    }

    writer.Close();
  } catch (std::exception& e) {
    std::cout << "ERROR: ADIOS2 exception: " << e.what() << std::endl;
  }
  MPI_Finalize();
  return 0;
}

Example was adapted from the ADIOS2 examples.

After compilation and running (e.g., mpiexec -np 8 <exec>), it produces localArray.h5, and when I view it with h5dump, both the sizes of v0 and v2 datasets are wrong. Their content content seems to be randomly picked from either of the MPI ranks. For v0 I expect the size to be 6 * nproc, while for v2, the size of each sub-dataset is chosen randomly at every time between 5 and 10, and I expect the outputted length to be the sum of all sizes (all MPI ranks).

example output
HDF5 "localArray.h5" {
GROUP "/" {
   ATTRIBUTE "NumSteps" {
      DATATYPE  H5T_STD_U32LE
      DATASPACE  SCALAR
      DATA {
      (0): 5
      }
   }
   GROUP "Step0" {
      DATASET "v0" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): 6, 6, 6, 6, 6, 6
         }
      }
      DATASET "v1" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 48 ) / ( 48 ) }
         DATA {
         (0): 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3,
         (21): 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6,
         (42): 7, 7, 7, 7, 7, 7
         }
      }
      DATASET "v2" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
         DATA {
         (0): 6, 6, 6, 6, 6
         }
      }
   }
   GROUP "Step1" {
      DATASET "v0" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): 6, 4.1, 4.1, 4.1, 7.1, 6.1
         }
      }
      DATASET "v1" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 48 ) / ( 48 ) }
         DATA {
         (0): 0.1, 0.1, 0, 0, 0, 0, 1.1, 2.1, 2.1, 2.1, 2.1, 2.1, 2.1, 0, 0,
         (15): 0, 0, 0, 3.1, 3.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 0, 5.1, 5.1,
         (29): 5.1, 5.1, 5.1, 5.1, 0, 0, 0, 6.1, 6.1, 6.1, 6.1, 6.1, 6.1,
         (42): 7.1, 7.1, 4.1, 4.1, 4.1, 7.1
         }
      }
      DATASET "v2" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
         DATA {
         (0): 6.1, 6.1, 6.1, 4.2, 4.2, 4.2, 7.2
         }
      }
   }
   GROUP "Step2" {
      DATASET "v0" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): 4.2, 4.2, 4.2, 7.2, 7.2, 7.2
         }
      }
      DATASET "v1" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 48 ) / ( 48 ) }
         DATA {
         (0): 7.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 0.2, 0, 1.2, 1.2, 1.2, 1.2,
         (13): 1.2, 2.2, 2.2, 2.2, 2.2, 2.2, 2.2, 0, 0, 3.2, 3.2, 4.2, 4.2,
         (26): 4.2, 4.2, 4.2, 4.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 0, 0, 0, 0,
         (40): 0, 0, 0, 6.2, 6.2, 6.2, 6.2, 6.2
         }
      }
      DATASET "v2" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 9 ) / ( 9 ) }
         DATA {
         (0): 4.2, 4.2, 4.2, 7.2, 7.2, 7.2, 7.2, 5.3, 7.3
         }
      }
   }
   GROUP "Step3" {
      DATASET "v0" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): 5.3, 7.3, 7.3, 7.3, 7.3, 6.3
         }
      }
      DATASET "v1" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 48 ) / ( 48 ) }
         DATA {
         (0): 0.3, 0.3, 0.3, 0.3, 1.3, 1.3, 1.3, 1.3, 2.3, 2.3, 2.3, 2.3,
         (12): 2.3, 2.3, 0, 0, 0, 3.3, 3.3, 3.3, 3.3, 4.3, 4.3, 4.3, 4.3,
         (25): 4.3, 4.3, 5.3, 5.3, 5.3, 5.3, 0, 0, 0, 0, 0, 6.3, 6.3, 6.3,
         (39): 6.3, 6.3, 6.3, 7.3, 5.3, 7.3, 7.3, 7.3, 7.3
         }
      }
      DATASET "v2" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): 6.3, 6.3, 6.3, 7.4, 7.4, 7.4
         }
      }
   }
   GROUP "Step4" {
      DATASET "v0" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): 7.4, 7.4, 7.4, 6.4, 6.4, 6.4
         }
      }
      DATASET "v1" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 48 ) / ( 48 ) }
         DATA {
         (0): 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 1.4, 1.4, 1.4, 1.4, 1.4, 1.4,
         (12): 2.4, 2.4, 2.4, 2.4, 2.4, 2.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
         (24): 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4,
         (36): 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 7.4, 7.4, 7.4, 7.4, 7.4, 7.4
         }
      }
      DATASET "v2" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
         DATA {
         (0): 7.4, 7.4, 7.4, 7.4, 7.4
         }
      }
   }
}
}

Info:

  • OS/Platform: Stellar cluster in Princeton University, Intel(R) Xeon(R) Gold 6242R CPU
  • ADIOS2 version: 2.9.0.200, build type: shared
  • GCC Version: 10.3.1, OpenMPI Version: 5.1.0a1, HDF5 Version: 1.14.1.2

Flags used to compile ADIOS2:

  -D CMAKE_CXX_STANDARD=17
  -D CMAKE_CXX_EXTENSIONS=OFF
  -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE
  -D BUILD_SHARED_LIBS=ON

  -D ADIOS2_USE_HDF5=ON
  -D ADIOS2_USE_Kokkos=ON
  -D ADIOS2_USE_MPI=ON

  -D ADIOS2_USE_Python=OFF
  -D ADIOS2_USE_Fortran=OFF
  -D ADIOS2_USE_ZeroMQ=OFF
  -D BUILD_TESTING=OFF
  -D ADIOS2_BUILD_EXAMPLES=OFF

I might be doing something wrong or using the dynamic writing not as intended. Could you please have a look and let me know if there is a bug on my end, or if something is indeed going wrong?

Thanks in advance!

@haykh
Copy link
Author

haykh commented Jul 24, 2023

Perhaps a related issue is when I do the same with Kokkos::View-s (dynamic size), it just hangs without being able to close the file and finish the output. Here's an example to reproduce.

#include <Kokkos_Core.hpp>
#include <stdio.h>

#include <chrono>
#include <iostream>
#include <numeric>

#ifdef ADIOS2_ENABLED
#  include <adios2.h>
#  include <adios2/cxx11/KokkosView.h>
#endif

#ifdef MPI_ENABLED
#  include <mpi.h>
#endif

namespace math = Kokkos;

auto Initialize(int argc, char* argv[]) -> void;
auto Finalize() -> void;

#ifdef ADIOS2_ENABLED
auto main(int argc, char* argv[]) -> int {
  Initialize(argc, argv);
  try {
    int mpi_rank = 0, mpi_size = 1;
#  ifndef MPI_ENABLED
    adios2::ADIOS adios;
#  else
    MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
    adios2::ADIOS adios(MPI_COMM_WORLD);
#  endif

    auto io = adios.DeclareIO("Test-Output");
    io.SetEngine("HDF5");

    const std::size_t maxptl = 200;
    const std::size_t ncells = 100;
    const auto        nsteps = 10;

    auto              myarr1 = Kokkos::View<double*>("myarr1", maxptl);
    auto              myarr2 = Kokkos::View<double*>("myarr2", ncells);

    io.DefineVariable<double>("myarr1", {}, {}, { adios2::UnknownDim });
    io.DefineVariable<double>(
      "myarr2", { ncells * mpi_size }, { ncells * mpi_rank }, { ncells });

    auto writer = io.Open("test.h5", adios2::Mode::Write);

    for (auto t { 0 }; t < nsteps; ++t) {
      if (mpi_rank == 0) {
        printf("Timestep #%d\n\n", t);
      }
      const auto nprt = (std::size_t)((t + 1) * (mpi_rank + 1));
      if (nprt > maxptl) {
        throw std::runtime_error("Too many particles");
      }
      adios.EnterComputationBlock();
      Kokkos::parallel_for(
        "fill_myarr1", nprt, KOKKOS_LAMBDA(const int i) {
          myarr1(i) += static_cast<double>(i);
        });
      Kokkos::parallel_for(
        "fill_myarr2", ncells, KOKKOS_LAMBDA(const int i) {
          myarr2(i) += static_cast<double>(i) + 0.1 * t;
        });
      adios.ExitComputationBlock();
      writer.BeginStep();

      // write myarr1
      auto myarr1_slice = Kokkos::View<double*>("myarr1_slice", nprt);
      auto slice        = std::make_pair((std::size_t)0, (std::size_t)nprt);
      Kokkos::deep_copy(myarr1_slice, Kokkos::subview(myarr1, slice));

      auto var1 = io.InquireVariable<double>("myarr1");
      var1.SetSelection({ {}, { nprt } });
      writer.Put<double>(var1, myarr1_slice);

      // write myarr2
      auto var2 = io.InquireVariable<double>("myarr2");
      writer.Put<double>(var2, myarr2);

      writer.EndStep();
    }
    if (mpi_rank == 0) {
      std::cout << "Closing file\n";
    }
    writer.Close();

  } catch (std::exception& e) {
    std::cerr << "Exception caught: " << e.what() << '\n';
    Finalize();
    return 1;
  }
  Finalize();
  return 0;
}
#else
auto main() -> int {
  std::cout << "ADIOS2 is not enabled.\n";
  return 0;
}
#endif

auto Initialize(int argc, char* argv[]) -> void {
  Kokkos::initialize(argc, argv);
#ifdef MPI_ENABLED
  MPI_Init(&argc, &argv);
#endif
}

auto Finalize() -> void {
#ifdef MPI_ENABLED
  MPI_Finalize();
#endif
  Kokkos::finalize();
}

If I comment out the output of myarr1 -- the other array, myarr2 is written correctly.

Info:

  • Kokkos version: 4.1.00
  • everything else is the same as above

Just in case, I added both of the examples (with the CMake files) into this repository.

@haykh haykh changed the title Dynamic variable shaping not working with parallel (MPI) HDF5 Dynamic variable shaping not working as expected with MPI Jul 24, 2023
@haykh
Copy link
Author

haykh commented Jul 24, 2023

UPD

I checked with the default BP5 output (compiling without HDF5), and the result is the same (updated the title of the issue accordingly). So most likely I am doing something wrong in the code. Could you please have a look at what I might be doing wrong? My original intent is to basically output an array, the size of which is unknown from the start, and it may vary in time differently for different MPI tasks.

@haykh
Copy link
Author

haykh commented Jul 25, 2023

UPD

I think I figured it out. The key was to:

  • add adios2::UnknownDim to all three of the start, count and shape, when defining the variable:

    io.DefineVariable<double>(
          "myarr1", { adios2::UnknownDim }, { adios2::UnknownDim }, { adios2::UnknownDim });
    // instead of:
    // io.DefineVariable<double>("myarr1", {}, {}, { adios2::UnknownDim });
  • explicitly set the shapes and the offsets when putting the data:

    var1.SetShape({ tot_nprt });
    var1.SetSelection({ { offset }, { nprt } });
    // instead of:
    // var1.SetSelection({ {}, { nprt } });

This is a bit counter-intuitive since the examples referred to above from this repo showed a different use case. Would still be interested to hear from the devs, on what's the correct way of doing this, so leaving this issue open for now.

@eisenhauer
Copy link
Member

I'm puzzled by this example. adios2::UnknownDim is just constexpr size_t 0 and doesn't appear in the actual implementation of ADIOS. (I.E. it's just in ADIOSTypes.h and in the example, nowhere else, so it's not like ADIOS is looking for that value.). I'm not really sure what is going on here, so I'm going to punt this to @pnorbert (who is unfortunately on vacation at the moment, so it might be a few days.)

@haykh
Copy link
Author

haykh commented Jul 25, 2023

@eisenhauer, no that i understand. i think it's fair to say i could have pretty much used any value, instead of adios2::UnknownDim. the key that solved the issue was to not leave shape = {} and start = {}.

@pnorbert
Copy link
Contributor

pnorbert commented Jul 25, 2023 via email

@haykh
Copy link
Author

haykh commented Jul 25, 2023

@pnorbert that indeed seems to be the case, as when i attempted to SetShape before it would complain that it can't set a shape for a local array. what confused me originally i think was this example, that seemed to suggest that the correct usage is:

adios2::Variable<double> varV2 =
    io.DefineVariable<double>("v2", {}, {}, {adios2::UnknownDim});
// ...
varV2.SetSelection(adios2::Box<adios2::Dims>({}, {Nelems}));
writer.Put<double>(varV2, v2.data());

@pnorbert
Copy link
Contributor

What did bpls -lD tell you about the BP5 datasets? Were they wrong?
I assume you would find all blocks listed individually.

Your original expectation was certainly wrong. Writing LocalArrays will not result in GlobalArray at reading. since you did not define for v0 and v2 how the blocks should be arranged into a global array.

If you are looking for adios to put blocks automatically together for you, that's called adios2::JoinedDim, but it only works for BP4 and BP5.
https://adios2.readthedocs.io/en/latest/components/components.html?highlight=Joined#shapes

@pnorbert
Copy link
Contributor

Did you always wanted GlobalArray, that is, an array which has a (single global) shape, but assumed you needed LocalArray for writing from multiple processes, or writing multiple blocks from a process? That is certainly not the case.

We use LocalArrays only for cases where the blocks cannot logically organized into a global array.

And almost always there is a way to put them into a global view, maybe a sparsely filled one. The only reason not to do that is to save on global communication to calculate the start offsets on each processor, and the global shape on all processors, especially for changing arrays. But then, joined array can still help if blocks only change size in one dimension.

@haykh
Copy link
Author

haykh commented Jul 25, 2023

@pnorbert thanks for a detailed response! Yes, i indeed must have misunderstood the concept of local vs global arrays. indeed i want a global array, imagine particle data spread out across multiple MPI blocks: they can logically be combined into a single global array. the use case i have in mind basically assumes that at each timestep there is an unknown number of particles in each MPI block: $n_1,~n_2,\dots$, which can nonetheless be logically concatenated into a single array of length $n_1+n_2+\dots$.

the solution i have in mind now basically looks like this (pseudocode):

io.DefineVariable<double>(
      "particles", { adios2::UnknownDim }, { adios2::UnknownDim }, { adios2::UnknownDim });

// let `npart` be # of particles on a given MPI task
// and `particle_array` is the array of particles (double) of size `npart`
MPI_Allgather(&npart, npart_all);
// `npart_all` is the array of `npart`-s from all MPI ranks
offset = sum(npart_all[ :mpi_rank ]); 
npart_global = sum(npart_all[ : ]);

auto var_p = io.InquireVariable<double>("particles");
var_p.SetShape({ npart_global });
var_p.SetSelection({ { offset }, { nprt } });
writer.Put<double>(var_p, particle_array);

does this sound reasonable? (we use HDF5 as the output engine most of the time, so adios2::JoinedDim is likely not an option)

@pnorbert
Copy link
Contributor

pnorbert commented Jul 25, 2023 via email

@haykh
Copy link
Author

haykh commented Jul 25, 2023

yes, we have multiple different quantities, like positions, velocities, etc. (in fact, these might have different types too: e.g., velocities are double, while positions are single). so the plan was to simply use the routine i outlined above to write every individual array separately. i'm assuming of course, that the arrays are not reshuffled, right? because the order of indices in the array of particle velocities has to be the same as for the coordinates.

@pnorbert
Copy link
Contributor

pnorbert commented Jul 25, 2023 via email

@haykh
Copy link
Author

haykh commented Jul 26, 2023

Thank you for responding. I think perhaps clarifying this confusion in the documentation might prove very helpful.

@haykh haykh closed this as completed Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants