-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to stop at MPIR_Breakpoint on Power systems with OpenMPI v3.0.x #5501
Comments
The fix for this might be related to #6613 (comment). That PMix fix hasn't been ported to the PMIx 2.x branch yet, but I can see if it's possible. In the meantime can you try the latest Open MPI v4.x release with the PMIx v3.1.3rc1 |
I just retested with a build of the Open MPI v4.0.x branch (386ed07) with PMIx v3.1.3rc3 and the same issue is present. The compiler was IBM 16.1.0. |
Per a note from off list. it looks like this is not reproducible with gcc, but is with PGI and IBM XL. |
@jjhursey is this fixed with latest PMIx update? |
@gpaulsen No. This still needs to be investigated. |
@awlauria will investigate this week. |
In short: The Unfortunately, there is no one "magic bullet" to make sure it works on all compilers. I explored adding support similar to what was done here for clang: but going that approach may not be feasible. It's simple enough to add support for gcc's Also checking the compiler using pre-defined directives is a no-go. XLC defines a variety of them, including The simplest solution that will hopefully satisfy most compilers is something akin to this:
But it comes with a small annoyance that at least under XLC, gdb breaking on MPIR_Breakpoint() will show the calling function as being broken on - as if the compiler made If the above is a satisfactory solution I can go ahead and PR it. |
IIRC, you aren't supposed to be able to use MPIR-based debuggers on optimized code - you are supposed to compile your code without optimization. Otherwise, even if MPIR might work, gdb won't find the necessary symbols and/or be able to properly align with the source. So are you saying that even without optimization turned on, the MPIR_Breakpoint is being optimized out? |
Do you mean the user application or OMPI? In this case, it's the way OMPI was compiled which was causing the issue. |
I was actually talking about OMPI - was it compiled optimized? Or with optimization turned off? If it was optimized, then you technically can't use it for debugging, AFAIK, as the symbols required for attachment will be missing. We've seen people try all kinds of tricks over the years to work around that problem, but none of them have been successful for the general case (they sometimes work in special cases). |
(for the archives) I don't think the below is correct:
We can (and have for a long time) shipped an optimized MPI library that works with MPIR. The key is that the file(s) that contain the MPIR symbols is compiled with debug symbols and not compiled optimized. That's why we have a funny looking Makefile near the I think the problem here is that the compiler is overly optimizing this function out even with |
We are agreeing, Josh - IF it was optimized, THEN the symbols are gone. What you are saying is that we try NOT to optimize the files with the symbols 😄 |
Correct. I'm just pointing out that the MPI library as a whole can still be optimized as long as that file is not. |
We had a similar issue in the OMPI layer regarding the debugger message queues. The solution is similar to what has been discussed here, force the compilation of certain files with additional debugging flags. Take a look at ompi/debugger/Makefile.am to see how we forced automake to do so, and at config/orte_setup_debugger_flags.m4 to see how the debugging flags are defined. |
This is why I want to see how |
The full set of PRs have been merged:
Can we close this ticket? |
Recently tested v3.1.x (which have the PRs now merged in) with IBM 16.1.0 and PGI 18.7 compilers and can confirm that this issue is now fixed. |
Thanks for the report, @Josh-Cottingham-Arm |
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v3.0.0 and v3.0.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
from source tarball with PGI 18.1 or IBM 16.1.1 Beta 2 compilers
Please describe the system on which you are running
Details of the problem
GDB is unable to stop at
MPIR_Breakpoint
when debugging thempirun
process using the MPIR interface on the above system with the above compilers.Here are the steps to reproduce with the simple
hello_c
program from #5349:The text was updated successfully, but these errors were encountered: