GPU reg tests crashing on Summit #816

PaulMullowney · 2021-03-04T22:32:52Z

The issue occurs in the methods run_face_elem_algorithm, run_face_elem_par_reduce, and run_face_elem_algorithm_nosimd in include/ngp_utils/NgpLoopUtils.h.

In particular, calls like the following crash:
const int nodesPerFace = nodes_per_entity(faceDataNGP, METype::FACE);

in include/ngp_utils/NgpMEUtils.h at line 67, i.e.
Kokkos::parallel_reduce(
1, KOKKOS_LAMBDA(int, int& n) {
n = me->nodesPerElement_;
}, npe);

However, if you replace nodes_per_entity(faceDataNGP, METype::FACE) call with
const int nodesPerFace = nodes_per_entity(faceDataNGP);
which calls the API above under the hood, the code doesn't crash.

Valgrind, cuda-memcheck do not show anything interesting in particular. I've been looking at this for days and I cannot find the issue. Perhaps another set of eyes might help.

PaulMullowney · 2021-03-05T00:47:05Z

No Hypre. ablNeutralNGPTrilinos crashes

#0  0x0000000000000000 in ?? ()
#1  0x00000000110f552c in __nv_hdl_wrapper_t (in=..., this=<optimized out>) at nvcc_internal_extended_lambda_implementation:237
#2  CudaFunctorAdapter (f_=..., this=<optimized out>) at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Cuda/Kokkos_Cuda_Parallel.hpp:2699
#3  functor (functor_in=...) at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Cuda/Kokkos_Cuda_Parallel.hpp:2838
#4  Kokkos::Impl::ParallelReduceAdaptor<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype), &(int sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU>(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype)), 1u>, void (int, int&), sierra::nalu::MasterElement*>, int>::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype), &(int sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU>(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype)), 1u>, void (int, int&), sierra::nalu::MasterElement*> const&, int&) (label=..., policy=..., functor=..., return_value=@0x7fffdd0addf0: 0)
    at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Kokkos_Parallel_Reduce.hpp:868
#5  0x0000000011248c40 in parallel_reduce<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(const sierra::nalu::ElemDataRequestsGPU&, sierra::nalu::METype), sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU>, 1>, void(int, int&), sierra::nalu::MasterElement*>, int> (return_value=<optimized out>, functor=..., policy=<optimized out>) at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Kokkos_Parallel_Reduce.hpp:1030
#6  sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU> (dataReq=..., meType=sierra::nalu::FACE) at ../include/ngp_utils/NgpMEUtils.h:77
#7  0x0000000011258bd8 in sierra::nalu::nalu_ngp::run_face_elem_algorithm<stk::mesh::DeviceMesh, sierra::nalu::nalu_ngp::FieldManager, sierra::nalu::ElemDataRequests, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (sierra::nalu::WallFuncGeometryAlg<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::*)(), &sierra::nalu::WallFuncGeometryAlg<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::execute, 1u>, void (sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh>&), unsigned int const, unsigned int const, sierra::nalu::MasterElement*, sierra::nalu::MasterElement*, bool, double, sierra::nalu::nalu_ngp::impl::ElemFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const, sierra::nalu::nalu_ngp::impl::NodeFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const, sierra::nalu::nalu_ngp::impl::NodeFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, sierra::nalu::nalu_ngp::MeshInfo<stk::mesh::DeviceMesh, sierra::nalu::nalu_ngp::FieldManager> const&, sierra::nalu::ElemDataRequests const&, sierra::nalu::ElemDataRequests const&, stk::mesh::Selector const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (sierra::nalu::WallFuncGeometryAlg<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::*)(), &sierra::nalu::WallFuncGeometryAlg<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::execute, 1u>, void (sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh>&), unsigned int const, unsigned int const, sierra::nalu::MasterElement*, sierra::nalu::MasterElement*, bool, double, sierra::nalu::nalu_ngp::impl::ElemFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const, sierra::nalu::nalu_ngp::impl::NodeFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const, sierra::nalu::nalu_ngp::impl::NodeFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const>) (algName=..., meshInfo=..., faceDataReqs=..., elemDataReqs=..., sel=..., algorithm=...) at ../include/ngp_utils/NgpLoopUtils.h:531
#8  0x0000000011268b7c in sierra::nalu::WallFuncGeometryAlg<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::execute (this=0x5ab78100) at ../src/ngp_algorithms/WallFuncGeometryAlg.C:98
#9  0x00000000113cf3fc in sierra::nalu::NgpAlgDriver::execute (this=0x5ac09d20) at ../src/ngp_algorithms/NgpAlgDriver.C:48
#10 0x000000001071bd6c in compute_geometry (this=0x6c634850) at ../src/Realm.C:2453
#11 sierra::nalu::Realm::initialize_prolog (this=0x6c634850) at ../src/Realm.C:527
#12 0x000000001072f33c in sierra::nalu::Realms::initialize_prolog (this=<optimized out>) at ../src/Realms.C:77
#13 0x000000001074c574 in sierra::nalu::Simulation::initialize (this=0x7fffdd0ecfe8) at ../src/Simulation.C:148
#14 0x00000000100ffa18 in main (argc=<optimized out>, argv=<optimized out>) at ../nalu.C:177

PaulMullowney · 2021-03-05T01:11:51Z

With Hypre: oversetRotCylNGPHypre

#0  0x0000000000000000 in ?? ()
#1  0x0000000011165fac in __nv_hdl_wrapper_t (in=..., this=<optimized out>) at nvcc_internal_extended_lambda_implementation:237
#2  CudaFunctorAdapter (f_=..., this=<optimized out>) at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Cuda/Kokkos_Cuda_Parallel.hpp:2699
#3  functor (functor_in=...) at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Cuda/Kokkos_Cuda_Parallel.hpp:2838
#4  Kokkos::Impl::ParallelReduceAdaptor<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype), &(int sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU>(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype)), 1u>, void (int, int&), sierra::nalu::MasterElement*>, int>::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype), &(int sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU>(sierra::nalu::ElemDataRequestsGPU const&, sierra::nalu::METype)), 1u>, void (int, int&), sierra::nalu::MasterElement*> const&, int&) (label=..., policy=..., functor=..., return_value=@0x7fffc069b600: 0)
    at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Kokkos_Parallel_Reduce.hpp:868
#5  0x00000000113f9a70 in parallel_reduce<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<int (*)(const sierra::nalu::ElemDataRequestsGPU&, sierra::nalu::METype), sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU>, 1>, void(int, int&), sierra::nalu::MasterElement*>, int> (return_value=<optimized out>, functor=..., policy=<optimized out>) at /ccs/proj/cfd116/shreyas/summit/exawind-2020-08/install/gcc-cuda10/trilinos-2021-03-03/include/Kokkos_Parallel_Reduce.hpp:1030
#6  sierra::nalu::nodes_per_entity<sierra::nalu::ElemDataRequestsGPU> (dataReq=..., meType=sierra::nalu::FACE) at ../include/ngp_utils/NgpMEUtils.h:77
#7  0x000000001140a6c4 in sierra::nalu::nalu_ngp::run_face_elem_algorithm<stk::mesh::DeviceMesh, sierra::nalu::nalu_ngp::FieldManager, sierra::nalu::ElemDataRequests, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (sierra::nalu::NodalGradPOpenBoundary<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::*)(), &sierra::nalu::NodalGradPOpenBoundary<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::execute, 1u>, void (sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh>&), sierra::nalu::MasterElement*, unsigned int const, unsigned int const, unsigned int const, unsigned int const, unsigned int const, unsigned int const, bool const, bool const, sierra::nalu::MasterElement*, double const, sierra::nalu::nalu_ngp::impl::NodeFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, sierra::nalu::nalu_ngp::MeshInfo<stk::mesh::DeviceMesh, sierra::nalu::nalu_ngp::FieldManager> const&, sierra::nalu::ElemDataRequests const&, sierra::nalu::ElemDataRequests const&, stk::mesh::Selector const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (sierra::nalu::NodalGradPOpenBoundary<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::*)(), &sierra::nalu::NodalGradPOpenBoundary<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::execute, 1u>, void (sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh>&), sierra::nalu::MasterElement*, unsigned int const, unsigned int const, unsigned int const, unsigned int const, unsigned int const, unsigned int const, bool const, bool const, sierra::nalu::MasterElement*, double const, sierra::nalu::nalu_ngp::impl::NodeFieldOp<stk::mesh::DeviceMesh, stk::mesh::DeviceField<double, stk::mesh::EmptyNgpFieldSyncDebugger>, sierra::nalu::nalu_ngp::FaceElemSimdData<stk::mesh::DeviceMesh> > const>) (algName=..., meshInfo=..., faceDataReqs=..., elemDataReqs=..., sel=..., algorithm=...) at ../include/ngp_utils/NgpLoopUtils.h:531
#8  0x00000000114193ac in sierra::nalu::NodalGradPOpenBoundary<sierra::nalu::AlgTraitsFaceElem<sierra::nalu::AlgTraitsQuad4, sierra::nalu::AlgTraitsHex8> >::execute (this=0x4b343180) at ../src/ngp_algorithms/NodalGradPOpenBoundaryAlg.C:119
#9  0x00000000114401dc in sierra::nalu::NgpAlgDriver::execute (this=0x4b9ecec8) at ../src/ngp_algorithms/NgpAlgDriver.C:48
#10 0x00000000105f54dc in compute_projected_nodal_gradient (this=0x4b9eccd0) at ../src/LowMachEquationSystem.C:3801
#11 sierra::nalu::LowMachEquationSystem::solve_and_update (this=0x4b9ec040) at ../src/LowMachEquationSystem.C:704
#12 0x000000001053d920 in sierra::nalu::EquationSystems::solve_and_update (this=0x54f5b2b8) at ../src/EquationSystems.C:771
#13 0x0000000010727d24 in sierra::nalu::Realm::advance_time_step (this=0x54f5b060) at ../src/Realm.C:1865
#14 0x00000000107aae40 in sierra::nalu::TimeIntegrator::integrate_realm (this=0x4c0a6880) at ../src/TimeIntegrator.C:342
#15 0x00000000107572b0 in sierra::nalu::Simulation::run (this=0x7fffc06c16c8) at ../src/Simulation.C:173
#16 0x0000000010104964 in main (argc=<optimized out>, argv=<optimized out>) at ../nalu.C:178

PaulMullowney · 2021-03-05T03:08:26Z

Each of these tests passes when Trilinos and Nalu are built in Debug.

PaulMullowney · 2021-03-05T17:02:40Z

A little bit more out of cuda-memcheck with the right environment variables

Kokkos::Cuda::initialize ERROR: likely mismatch of architecture
[a27n02:31735] *** Process received signal ***
[a27n02:31735] Signal: Aborted (6)
[a27n02:31735] Signal code:  (-6)
[a27n02:31735] [ 0] [0x2000000504d8]
[a27n02:31735] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200027a42094]
[a27n02:31735] [ 2] /gpfs/alpine/cfd116/scratch/mullowne/nalu-wind/build_gpu_master/naluX[0x16691e18]
[a27n02:31735] [ 3] /gpfs/alpine/cfd116/scratch/mullowne/nalu-wind/build_gpu_master/naluX[0x166b23a8]
[a27n02:31735] [ 4] /gpfs/alpine/cfd116/scratch/mullowne/nalu-wind/build_gpu_master/naluX[0x1668d4b0]
[a27n02:31735] [ 5] /gpfs/alpine/cfd116/scratch/mullowne/nalu-wind/build_gpu_master/naluX[0x10103fc0]
[a27n02:31735] [ 6] /lib64/libc.so.6(+0x25200)[0x200027a25200]
[a27n02:31735] [ 7] /lib64/libc.so.6(__libc_start_main+0xc4)[0x200027a253f4]
[a27n02:31735] *** End of error message ***
========= CUDA-MEMCHECK
========= Error: process didn't terminate successfully
========= Fatal UVM GPU fault of type invalid pte due to invalid address
=========     during read access to address 0x201e6b840000
=========
========= Fatal UVM GPU fault of type invalid pte due to invalid address
=========     during read access to address 0x201e6b840000
=========
========= No CUDA-MEMCHECK results found

PaulMullowney · 2021-05-03T20:45:54Z

Just built with Trilinos master (abfd14fbe0d) and develop (c00ff3bb339). Crashes no longer happen. Not sure what to make of this as the Nalu code hasn't changed. Oh well.

sayerhs assigned alanw0 and PaulMullowney Mar 4, 2021

PaulMullowney closed this as completed May 3, 2021

PaulMullowney mentioned this issue Feb 15, 2022

Nalu-Wind seg fault on Summit regression tests #929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU reg tests crashing on Summit #816

GPU reg tests crashing on Summit #816

PaulMullowney commented Mar 4, 2021

PaulMullowney commented Mar 5, 2021 •

edited by sayerhs

Loading

PaulMullowney commented Mar 5, 2021 •

edited by sayerhs

Loading

PaulMullowney commented Mar 5, 2021

PaulMullowney commented Mar 5, 2021 •

edited by sayerhs

Loading

PaulMullowney commented May 3, 2021 •

edited

Loading

GPU reg tests crashing on Summit #816

GPU reg tests crashing on Summit #816

Comments

PaulMullowney commented Mar 4, 2021

PaulMullowney commented Mar 5, 2021 • edited by sayerhs Loading

PaulMullowney commented Mar 5, 2021 • edited by sayerhs Loading

PaulMullowney commented Mar 5, 2021

PaulMullowney commented Mar 5, 2021 • edited by sayerhs Loading

PaulMullowney commented May 3, 2021 • edited Loading

PaulMullowney commented Mar 5, 2021 •

edited by sayerhs

Loading

PaulMullowney commented Mar 5, 2021 •

edited by sayerhs

Loading

PaulMullowney commented Mar 5, 2021 •

edited by sayerhs

Loading

PaulMullowney commented May 3, 2021 •

edited

Loading