-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in testxxx runTest.exe for debug builds (need separate cpu/gpu namespaces) #725
Comments
I have checked that this happens also on itscrd80. And it is there since ever! It was there even in this first commit for gg_tt.sa
|
Using my latest fpe branch now for simplicity. This is really strange. I get different results depending on which AVX I choose. The tests that segfault are the GPU tests, so they should not even be impacted by AVX (to first approximation)? With AVX=none this test succeeds, with all other AVX they segfault
Note also, in debug mode and with AVX=none, instead, the Compare test on GPU fails...
With sse4 debug I also get a similar failure
With avx2 and 512y the test succeeds, but it fails again with 512z
|
On second thought, this is very weird, from above:
In other words: the GPU test is giving a segfault in a cxtype_v type that is only meant to exist for SIMD CPUs! This is a clear example of #602: we should make CUDA anc C++ builds completely independent from each other (see also #680 and #674). At the very least: we should avoid having a single runTest.exe executable where we mix both types of code. While I used two different Cpu and Gpu namespaces for some parts of the code, the basic types like mgOnGpu::cxtype have different meanings in the two implementations. Mixing them is a very bad idea... Another option is to separate the namespaces (see #318)?.. |
…nt namespaces This CONCLUDES the cleanup of namespaces in ggtt.sa: everything builds and runs ok NB: in debug mode, now runTest succeeds! As intended, this fixes the segfault in madgraph5#725
…einfo to 'debug' flags (investigate madgraph5#725)
…flags as in SubProcesses (investigate madgraph5#725)
… conflict: "[namespace] in ggtt.sa, fix testmisc.cc and testxxx.cc to use different namespaces This CONCLUDES the cleanup of namespaces in ggtt.sa: everything builds and runs ok NB: in debug mode, now runTest succeeds! As intended, this fixes the segfault in madgraph5#725"
Note, as discussed in #723, the fact that I was getting also strange failures mixing vecor sizes is most likely due to the fact that I was mixing cpu fptype_v and GPU fpytype_v in the same executable... |
… conflict: "[namespace] in ggtt.sa, fix testmisc.cc and testxxx.cc to use different namespaces This CONCLUDES the cleanup of namespaces in ggtt.sa: everything builds and runs ok NB: in debug mode, now runTest succeeds! As intended, this fixes the segfault in madgraph5#725" Note: runTest.exe now succeeds in all AVX modes, both in debug and no-debug mode
…n tput and tmad for easier merging This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively. Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701, but there are still other issues remaining (being debugged in branch nobm). Revert "[fpe] rerun 15 tmad - ggttgg tests fail again madgraph5#655 as expected" This reverts commit 9212960. Revert "[fpe] rerun 78 tput alltees, all ok" This reverts commit 9a68868.
… easier merging This ~completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively. (HOWEVER, the CI on MacOS failed for this with madgraph5#730 - still a few things to change before merging). Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701, but there are still other issues remaining (being debugged in branch nobm). Revert "[fpe] rerun 15 tmad - ggttgg tests fail again madgraph5#655 as expected" This reverts commit 9212960. Revert "[fpe] rerun 78 tput alltees, all ok" This reverts commit 9a68868.
…madgraph5#730 and madgraph5#731 This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively. Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701, but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733): IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Segfault in testxxx runTest.exe for debug builds on itscrd90
This is related to #701. I wanted to test clang #724 on my MR #723 for this issue. But on Alma8 itscrd80 I have not set up clang. So I went to Alma9 itscrd90. Before trying clang I tried the default gcc11.3. In normal builds the tests succeed. But in debug builds they fail. This was on my fpe branch for MR #723. As a cross check, I went back t upstream/master... and the segfault also happens there!
Rephrase: this is a new segfault which I found while investigating #701, but which has probably nothing to do with it. It affects upstream/master....
The text was updated successfully, but these errors were encountered: