Issue with ManagedArray::Operator[] and RAJA #184

rcarson3 · 2021-07-28T17:24:01Z

I recently ran into a case for an application I'm working on where I had the initial chai::ExecutionSpace set to chai::ExecutionSpace::GPU for a chai::ManagedArray, and then I had it run within a RAJA forall loop on the CPU to initialize the data for some unit tests. Later on when trying to access the data on the host using the ManagedArray::Operator[] I would get a segfault. My understanding talking with @davidbeckingsale offline is that this should work as the data should now be allocated on the host/cpu.

I've included a MWE representing the issue.

Also, one other issue I noted was that doing a lambda capture by reference for the host RAJA loop also resulted in a segfault. I've included that below as well as commented out section of code. I will note that I've found this sort of lambda capture works fine if the original ExecutionSpace is set to the CPU.

Also, I'm using the following hashes for CHAI, RAJA, and Umpire:
CHAI: df3e8e0
RAJA: 3047fa720132d19ee143b1fcdacaa72971f5988c (v0.13.0 tagged release)
Umpire: 447f4640eff7b8f39d3c59404f3b03629b90c021 (v4.1.2 tagged release)

Additional information:
Compiled on rzansel with gcc/7.3.1, cuda/10.1.243, and cmake/3.14.5

#include "RAJA/RAJA.hpp"

#include "umpire/strategy/DynamicPool.hpp"
#include "umpire/Allocator.hpp"
#include "umpire/ResourceManager.hpp"

#include "chai/config.hpp"
#include "chai/ExecutionSpaces.hpp"
#include "chai/ManagedArray.hpp"

int main()
{

      auto& rm = umpire::ResourceManager::getInstance();
      auto host_allocator = rm.getAllocator("HOST");
#ifdef __CUDACC__
      auto device_allocator = rm.makeAllocator<umpire::strategy::DynamicPool>
                              ("DEVICE_pool", rm.getAllocator("DEVICE"));
#endif

      const int size = 5000;

      chai::ManagedArray<double> array(size, 
      std::initializer_list<chai::ExecutionSpace>{chai::CPU
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
         , chai::GPU
#endif
         },
         std::initializer_list<umpire::Allocator>{host_allocator
#if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP)
         , device_allocator
#endif
      },
      chai::ExecutionSpace::GPU);

      std::cout << "Running GPU runs" << std::endl;
      // This works
      RAJA::forall<RAJA::cuda_exec<256>>(RAJA::RangeSegment(0, size),
         [=] __device__ (int i) {
            array[i] = i;
      });

      std::cout << "Running CPU runs" << std::endl;
      // This should work but fails
      // RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, size),
      //    [&] (int i) {
      //       array[i] = i;
      //    });
      // This works
      RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, size),
         [=] (int i) {
            array[i] = i;
         });
      std::cout << "Printing out data" << std::endl;
      // These work
      // std::cout << array.data(chai::ExecutionSpace::CPU)[0] << std::endl;
      // std::cout << array.data()[0] << std::endl;
      // This should work since we last ran things on the CPU but fails
      std::cout << array[0] << std::endl;
      array.free();
      return 0;
}

The text was updated successfully, but these errors were encountered:

robinson96 · 2021-07-28T17:59:32Z

Hey Robert, Chai requires lambdas to capture managed arrays by value – it’s the resulting copy constructor call that ensures data is allocated (if necessary) on the device and moved (if necessary). Any use of a [] operator outside of a RAJA loop context is essentially undefined behavior, as CHAI makes no guarantees about having a valid pointer being dereferenced by the [] in such a case, and even if it’s valid, CHAI makes no guarantee it is pointing to live, correct data anywhere except from within a RAJA context. The .data() calls and .pick() calls are a way to ensure that you are referencing valid data outside of a RAJA context, as you’ve observed. On ALE3D we use CARE::host_device_ptr (which wrap chai::managed_asrray) and some clang-query checks to provide compile-time checks to ensure we do not doe this. -Peter From: Robert Carson ***@***.***> Sent: Wednesday, July 28, 2021 10:24 AM To: LLNL/CHAI ***@***.***> Cc: Subscribed ***@***.***> Subject: [LLNL/CHAI] Issue with ManagedArray::Operator[] and RAJA (#184) @davidbeckingsale<https://urldefense.us/v3/__https:/github.com/davidbeckingsale__;!!G2kpM7uM-TzIFchu!iMbukC5nz4mzMLJ1USb60tTMbEnbG8UJ4ZSbPUTKlktdipCs9c_E4WPx5S500c58TxI$> and others I recently ran into a case for an application I'm working on where I had the initial chai::ExecutionSpace set to chai::ExecutionSpace::GPU for a chai::ManagedArray, and then I had it run within a RAJA forall loop on the CPU to initialize the data for some unit tests. Later on when trying to access the data on the host using the ManagedArray::Operator[] I would get a segfault. My understanding talking with @davidbeckingsale<https://urldefense.us/v3/__https:/github.com/davidbeckingsale__;!!G2kpM7uM-TzIFchu!iMbukC5nz4mzMLJ1USb60tTMbEnbG8UJ4ZSbPUTKlktdipCs9c_E4WPx5S500c58TxI$> offline is that this should work as the data should now be allocated on the host/cpu. I've included a MWE representing the issue. Also, one other issue I noted was that doing a lambda capture by reference for the host RAJA loop also resulted in a segfault. I've included that below as well as commented out section of code. I will note that I've found this sort of lambda capture works fine if the original ExecutionSpace is set to the CPU. Also, I'm using the following hashes for CHAI, RAJA, and Umpire: CHAI: df3e8e0<https://urldefense.us/v3/__https:/github.com/LLNL/CHAI/commit/df3e8e0f1b4350622bb9a68cfc7a23edc16e4bb9__;!!G2kpM7uM-TzIFchu!iMbukC5nz4mzMLJ1USb60tTMbEnbG8UJ4ZSbPUTKlktdipCs9c_E4WPx5S50CL7a71w$> RAJA: 3047fa720132d19ee143b1fcdacaa72971f5988c (v0.13.0 tagged release) Umpire: 447f4640eff7b8f39d3c59404f3b03629b90c021 (v4.1.2 tagged release) Additional information: Compiled on rzansel with gcc/7.3.1, cuda/10.1.243, and cmake/3.14.5 #include "RAJA/RAJA.hpp" #include "umpire/strategy/DynamicPool.hpp" #include "umpire/Allocator.hpp" #include "umpire/ResourceManager.hpp" #include "chai/config.hpp" #include "chai/ExecutionSpaces.hpp" #include "chai/ManagedArray.hpp" int main() { auto& rm = umpire::ResourceManager::getInstance(); auto host_allocator = rm.getAllocator("HOST"); #ifdef __CUDACC__ auto device_allocator = rm.makeAllocator<umpire::strategy::DynamicPool> ("DEVICE_pool", rm.getAllocator("DEVICE")); #endif const int size = 5000; chai::ManagedArray<double> array(size, std::initializer_list<chai::ExecutionSpace>{chai::CPU #if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP) , chai::GPU #endif }, std::initializer_list<umpire::Allocator>{host_allocator #if defined(CHAI_ENABLE_CUDA) || defined(CHAI_ENABLE_HIP) , device_allocator #endif }, chai::ExecutionSpace::GPU); std::cout << "Running GPU runs" << std::endl; // This works RAJA::forall<RAJA::cuda_exec<256>>(RAJA::RangeSegment(0, size), [=] __device__ (int i) { array[i] = i; }); std::cout << "Running CPU runs" << std::endl; // This should work but fails // RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, size), // [&] (int i) { // array[i] = i; // }); // This works RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0, size), [=] (int i) { array[i] = i; }); std::cout << "Printing out data" << std::endl; // These work // std::cout << array.data(chai::ExecutionSpace::CPU)[0] << std::endl; // std::cout << array.data()[0] << std::endl; // This should work since we last ran things on the CPU but fails std::cout << array[0] << std::endl; array.free(); return 0; } — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/LLNL/CHAI/issues/184__;!!G2kpM7uM-TzIFchu!iMbukC5nz4mzMLJ1USb60tTMbEnbG8UJ4ZSbPUTKlktdipCs9c_E4WPx5S507sGEPsM$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/ABKR6UEKJCVHSYKQEES7OJ3T2A4L3ANCNFSM5BEZOUKA__;!!G2kpM7uM-TzIFchu!iMbukC5nz4mzMLJ1USb60tTMbEnbG8UJ4ZSbPUTKlktdipCs9c_E4WPx5S505CDO0YY$>.

rcarson3 · 2021-07-28T18:32:11Z

@robinson96 thanks for the quick response. I was unaware about the lambda capture by value requirement for the internal data transfer portion of things to work. The same goes about [] operator being undefined behavior outside of RAJA loops. I will definitely be careful about in my application codes and will need to update some libraries in regards to the above lambda capture issue.

If y'all would be open to it I can open a PR with some small documentations changes noting these requirements.

robinson96 · 2021-07-28T20:43:49Z

That sounds like a good idea to me. -Peter From: Robert Carson ***@***.***> Sent: Wednesday, July 28, 2021 11:32 AM To: LLNL/CHAI ***@***.***> Cc: Robinson, Peter ***@***.***>; Mention ***@***.***> Subject: Re: [LLNL/CHAI] Issue with ManagedArray::Operator[] and RAJA (#184) @robinson96<https://urldefense.us/v3/__https:/github.com/robinson96__;!!G2kpM7uM-TzIFchu!j6_N-Jux3PWXQF_ed3QjOcIZhzWHWPgixbzf6M7t2MekJcBgVwhol5PW--xDVOm5ZJI$> thanks for the quick response. I was unaware about the lambda capture by value requirement for the internal data transfer portion of things to work. The same goes about [] operator being undefined behavior outside of RAJA loops. I will definitely be careful about in my application codes and will need to update some libraries in regards to the above lambda capture issue. If y'all would be open to it I can open a PR with some small documentations changes noting these requirements. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/LLNL/CHAI/issues/184*issuecomment-888529257__;Iw!!G2kpM7uM-TzIFchu!j6_N-Jux3PWXQF_ed3QjOcIZhzWHWPgixbzf6M7t2MekJcBgVwhol5PW--xD1OY1y8U$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/ABKR6UGDYCKSW2N3KQSDX3TT2BELNANCNFSM5BEZOUKA__;!!G2kpM7uM-TzIFchu!j6_N-Jux3PWXQF_ed3QjOcIZhzWHWPgixbzf6M7t2MekJcBgVwhol5PW--xDkbszxCg$>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with ManagedArray::Operator[] and RAJA #184

Issue with ManagedArray::Operator[] and RAJA #184

rcarson3 commented Jul 28, 2021

robinson96 commented Jul 28, 2021 via email

rcarson3 commented Jul 28, 2021

robinson96 commented Jul 28, 2021 via email

Issue with ManagedArray::Operator[] and RAJA #184

Issue with ManagedArray::Operator[] and RAJA #184

Comments

rcarson3 commented Jul 28, 2021

robinson96 commented Jul 28, 2021 via email

rcarson3 commented Jul 28, 2021

robinson96 commented Jul 28, 2021 via email