-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash (FPE) in check_hip.exe on LUMI #1003
Comments
And... this is yet another FPE crash that disappears in debug mode. I will add another volatile, most likely, but this is becoming too much. |
Ok this magically fixes it
See issue 3653 in https://github.com/microsoft/DeepSpeed/issues for HIP_CLANG_ONLY |
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…nd improve error handling
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…graph5#1003 by disabling SIMD in C++ objects for HIP builds - it does not help, will revert
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
Revert "[amd] in gg_tt.mad cudacpp.mk, try to work around the HIP crashes madgraph5#1003 by disabling SIMD in C++ objects for HIP builds - it does not help, will revert" This reverts commit 2fc102767ecc6ae2e95770f4cff18e5c08d31fc1.
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…h5#1003 by disabling SIMD in C++ objects built with hipcc - it also does not help, will revert
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
Revert "[amd] in gg_tt.mad cudacpp.mk, try to work around HIP crashes madgraph5#1003 by disabling SIMD in C++ objects built with hipcc - it also does not help, will revert" This reverts commit 1e225fd7068eb0c67377f55c7e910af945a4d963.
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…adgraph5#1003 by adding volatile - it does not work, will revert
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
Revert "[amd] in gg_tt.mad EventStatistics.h, try to work around HIP crashes madgraph5#1003 by adding volatile - it does not work, will revert" This reverts commit e2591da7b159b6d133a7cff7a4b583a8ad34d563.
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…h5#1003 by printing out sum.nevtOK() - this avoids teh crash but is not practical, will revert
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
Revert "[amd] in gg_tt.mad EventStatistics.h, work around HIP crashes madgraph5#1003 by printing out sum.nevtOK() - this avoids teh crash but is not practical, will revert" This reverts commit 725dae88d89a61d005a0031c9462fe95f4ec6728.
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
madgraph5#1003 on hipcc by disabling optimizations for operator+=
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…rash madgraph5#1005 on clang16 by disabling optimizations for operator+= This extends to any clang the previous workaround for madgraph5#1003 which had been defined only for HIP clang
valassi
added a commit
to valassi/madgraph4gpu
that referenced
this issue
Sep 18, 2024
…#1005 and for gcc142 cxtype_ref madgraph5#1003
This was referenced Sep 18, 2024
This is fixed in PR #1006. Closing |
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…nd improve error handling
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…graph5#1003 by disabling SIMD in C++ objects for HIP builds - it does not help, will revert
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
Revert "[amd] in gg_tt.mad cudacpp.mk, try to work around the HIP crashes madgraph5#1003 by disabling SIMD in C++ objects for HIP builds - it does not help, will revert" This reverts commit 2fc102767ecc6ae2e95770f4cff18e5c08d31fc1.
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…h5#1003 by disabling SIMD in C++ objects built with hipcc - it also does not help, will revert
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
Revert "[amd] in gg_tt.mad cudacpp.mk, try to work around HIP crashes madgraph5#1003 by disabling SIMD in C++ objects built with hipcc - it also does not help, will revert" This reverts commit 1e225fd7068eb0c67377f55c7e910af945a4d963.
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…adgraph5#1003 by adding volatile - it does not work, will revert
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
Revert "[amd] in gg_tt.mad EventStatistics.h, try to work around HIP crashes madgraph5#1003 by adding volatile - it does not work, will revert" This reverts commit e2591da7b159b6d133a7cff7a4b583a8ad34d563.
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…h5#1003 by printing out sum.nevtOK() - this avoids teh crash but is not practical, will revert
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
Revert "[amd] in gg_tt.mad EventStatistics.h, work around HIP crashes madgraph5#1003 by printing out sum.nevtOK() - this avoids teh crash but is not practical, will revert" This reverts commit 725dae88d89a61d005a0031c9462fe95f4ec6728.
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
madgraph5#1003 on hipcc by disabling optimizations for operator+=
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…rash madgraph5#1005 on clang16 by disabling optimizations for operator+= This extends to any clang the previous workaround for madgraph5#1003 which had been defined only for HIP clang
zeniheisser
pushed a commit
to zeniheisser/madgraph4gpu
that referenced
this issue
Sep 23, 2024
…#1005 and for gcc142 cxtype_ref madgraph5#1003
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am running tests of AMD GPUs on LUMI #998
I am getting a new very bizarre crash
The text was updated successfully, but these errors were encountered: