-
Notifications
You must be signed in to change notification settings - Fork 897
Disable atomics in OBJ_RETAIN/RELEASE during THREAD_SINGLE #1902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is a case for atomics in THREAD_SINGLE if async progress is enabled. The change should be to use atomics if opal_using_threads() is true. |
@nysal There is actually a bit more cleanup that could be done. I have a little time today to take a crack at it. It will probably need a little work but I can at least give an idea of how I think things should be structured. |
@hjelmn Sure, let me know how you want it structured. I was initially planning to move the opal_using_threads() and closely tied APIs (opal_set_using_threads()) to say opal/threads/threads_base.h. Then include this new header in opal/class/opal_object.h. This is to break the circular dependency. The second optimization was to skip reference counting for MPI_COMM_WORLD and predefined dataypes. The atomics hurt us here more than a branch could. The latter optimization should also hopefully help multithreaded message rate benchmarks. |
I ran the phloem SQMR benchmark using vader BTL (master branch) to measure the message rate, using 2 tasks (--bind-to core --map-by socket) on a Power8 system. I see roughly 30% improvement in message rate for small messages with the patch to disable atomic refcounts, when the application is single threaded.
|
This is the patch I used for the previously reported benchmark: nysal@7a1a9d9 The comm world and predefined datatype refcount optimization is not included in this commit. I will add it as a separate commit as it seems logically separate. @hjelmn This is not yet ready to commit, I just put it out there to get comments. |
@nysal That is also what I have in mind. I also want to clean up the OPAL_THREAD macros a little bit as part of this. Should have that done soon. |
👍 Nice, @nysal! |
Looks like we actually did fix this with a one-off for 1.8 but I think we decided at the time to hold off on fixing master. It has been some time so I do not remember why exactly. Was kinda busy around the time we fixed this in v1.8. Anyway, we completely lost track of this issue until @nysal managed to re-discover it. Here is the v1.8 fix: open-mpi/ompi-release@ce91307. I will open a PR for a cherry-pick of that onto master. Will leave the cleanup for a later commit. |
@hjelmn Yes, that indeed looks exactly like my patch. Please do bring it over to master and v2.x. I'll open a PR for the comm world and predefined data type refcounts today. |
@nysal Will do that now. Github went down right after I posted that comment. |
This commit adds opal_using_threads() protection around the atomic operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance issues seen when running psm with MPI_THREAD_SINGLE. To avoid issues with header dependencies opal_using_threads() has been moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and OPAL_THREAD_CMPSET* macros have also been relocated to this header. This was cherry-picked off a fix applied to v1.8 and not master. See open-mpi/ompi#1902. (cherry picked from commit ce91307) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds opal_using_threads() protection around the atomic operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance issues seen when running psm with MPI_THREAD_SINGLE. To avoid issues with header dependencies opal_using_threads() has been moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and OPAL_THREAD_CMPSET* macros have also been relocated to this header. This was cherry-picked off a fix applied to v1.8 and not master. See open-mpi/ompi#1902. (cherry picked from commit ce91307) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds opal_using_threads() protection around the atomic operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance issues seen when running psm with MPI_THREAD_SINGLE. To avoid issues with header dependencies opal_using_threads() has been moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and OPAL_THREAD_CMPSET* macros have also been relocated to this header. This commit is cherry-picked off a fix that was submitted for the v1.8 release series but never applied to master. This fixes part of the problem reported by @nysal in open-mpi#1902. (cherry picked from commit open-mpi/ompi-release@ce91307) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Ok, merged. I have a PR open for 2.0.1 as well. |
PR for v2.0.1: open-mpi/ompi-release#1287 |
Fixed by open-mpi/ompi-release#1287 |
This commit adds opal_using_threads() protection around the atomic operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance issues seen when running psm with MPI_THREAD_SINGLE. To avoid issues with header dependencies opal_using_threads() has been moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and OPAL_THREAD_CMPSET* macros have also been relocated to this header. This commit is cherry-picked off a fix that was submitted for the v1.8 release series but never applied to master. This fixes part of the problem reported by @nysal in open-mpi#1902. (cherry picked from commit open-mpi/ompi-release@ce91307) Signed-off-by: Nathan Hjelm <hjelmn@me.com>
@nysal brought up on the 2016-07-26 webex that the code still uses atomics for OBJ_RETAIN/RELEASE even in the THREAD_SINGLE case.
@hjelmn noted on the call that this is mainly for historical / code structure reasons, but it should probably be fixed (i.e., there's no need to use the atomics in the THREAD_SINGLE case). On the call, he proposed separating some header files that would obviate the need for a circular dependency (which is why the code is currently the way it is).
@nysal said he would take a whack at this. I'm tentatively marking this as v2.0.1 -- this could easily be considered a performance bug. Depending on how the fix shakes out, it may slip to a later milestone.
My question, however is: is there a difference between thread safety for RETAIN/RELEASE for single and multiple application threads, and single and multiple Open MPI implementation threads? I.e., is there a case where we still need atomic-enabled RETAIN/RELEASE macros even in a THREAD_SINGLE scenario?
The text was updated successfully, but these errors were encountered: