-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm: Memory access fault by GPU node #3620
Comments
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/-/jobs/219168
|
jngrad
changed the title
CI build failed for merged PR
ROCm: Memory access fault by GPU node
Apr 1, 2020
Merged
kodiakhq bot
added a commit
that referenced
this issue
Apr 3, 2020
The `ln -s /opt/rocm/bin/hcc* /opt/rocm/hip/bin/` issue has been worked around by properly setting `HCC_PATH` on the CMake side. The shutdown issue has been worked around by replacing interrupts with polling (suggested at ROCm/roctracer#22 (comment)). Something is wrong with the destruction order in our code, but I cannot easily identify what. It's not the missing `cudaDestoryStream` though. Fixes #3620 (according to `ctest -R save_checkpoint_lb.cpu-p3m.cpu-lj-therm.lb_1 --repeat-until-fail 1000`). Fixes #3587 (according to `ctest -R ek_charged_plate --repeat-until-fail 100`). **TODO** - https://github.com/espressomd/docker/blob/master/docker/rocm-python3/Dockerfile-latest needs to be updated to ROCm 3.3 once this pull request is merged.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/pipelines/11601
The text was updated successfully, but these errors were encountered: