-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU reg tests crashing on Summit #816
Comments
|
|
Each of these tests passes when Trilinos and Nalu are built in Debug. |
A little bit more out of cuda-memcheck with the right environment variables
|
Just built with Trilinos master (abfd14fbe0d) and develop (c00ff3bb339). Crashes no longer happen. Not sure what to make of this as the Nalu code hasn't changed. Oh well. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The issue occurs in the methods run_face_elem_algorithm, run_face_elem_par_reduce, and run_face_elem_algorithm_nosimd in include/ngp_utils/NgpLoopUtils.h.
In particular, calls like the following crash:
const int nodesPerFace = nodes_per_entity(faceDataNGP, METype::FACE);
in include/ngp_utils/NgpMEUtils.h at line 67, i.e.
Kokkos::parallel_reduce(
1, KOKKOS_LAMBDA(int, int& n) {
n = me->nodesPerElement_;
}, npe);
However, if you replace nodes_per_entity(faceDataNGP, METype::FACE) call with
const int nodesPerFace = nodes_per_entity(faceDataNGP);
which calls the API above under the hood, the code doesn't crash.
Valgrind, cuda-memcheck do not show anything interesting in particular. I've been looking at this for days and I cannot find the issue. Perhaps another set of eyes might help.
The text was updated successfully, but these errors were encountered: