Skip to content
rcclay edited this page Sep 9, 2018 · 5 revisions

Gotcha's with Kokkos:


Einspline SPO Evaluation (Initial Kokkos Implementation)

  1. Handling legacy C code will take some care. C-style structs -> C++ structs really should have constructors/destructors called, and thus should use "new" and "delete" instead of malloc and free. I think this causes problems with reference counting when Kokkos data types are dropped into legacy C code. Should maybe look into "kokkos_malloc" and "kokkos_free".
  2. Kokkos seems to require copy constructors with same syntax as the default copy constructor. Change = delete to = default.
  3. Looks like care should definitely be taken regarding static class members. It's delicate in CUDA, and Kokkos reflects this. Move MultiBSplineData into MultiBSpline for now to circumvent this problem.
  4. All calls to C++ std library functions in Kokkos parallel regions should be looked at very carefully ("looked at"=purged). Does not play well with CUDA. See "std::fill" in src/Numerics/Spline2/MultiBspline.hpp:evaluate_v(...). Chugs along fine for CPU, GPU code compiles, but frustratingly difficult to find runtime error.
  5. In einspline_spo, parallelizing evaluate_v needs "psi" and "einspline" arrays. Overloading the einspline_spo class functor gives access to these views, whereas using the provided KOKKOS_LAMBDA macro assumes the Lambda function will capture by value, NOT by reference which is required.

Jastrow Evaluation

  1. For complex data types, very care must be taken when initializing Views of objects. For example, in TwoBodyJastrow.h, the following must be done to handle the functor:
  typedef Kokkos::Device<
            Kokkos::DefaultHostExecutionSpace,
            typename Kokkos::DefaultExecutionSpace::memory_space>
     F_device_type;
  Kokkos::View<FT*,F_device_type> F;

  ///Jumping to the initializer:
  F = Kokkos::View<FT*,F_device_type>("FT",NumGroups * NumGroups);
  for(int i=0; i<NumGroups*NumGroups; i++) {
    new (&F(i)) FT();
  /// continued

Specification of a device type is necessary because the placement new syntax happens on the host, whereas the memory needs to be allocated on the device. Traditionally, the execution and memory spaces default to the default execution space.