-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opal/mca/threads/qthreads: Fix #8036 #8053
opal/mca/threads/qthreads: Fix #8036 #8053
Conversation
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on that 👍 Some small comments inline
opal_threads_ensure_init_qthreads(); | ||
opal_thread_t *t = OBJ_NEW(opal_thread_t); | ||
t->t_thread_ret_ptr = opal_thread_get_qthreads_self(); | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return NULL; | |
return t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing. I fixed it.
/* Check if someone woke me up. */ | ||
opal_atomic_lock(&cond->m_lock); | ||
int signaled = waiter.m_signaled; | ||
opal_atomic_unlock(&cond->m_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking that lock seems unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock (or any memory barrier that can order accesses to m_signaled
) is necessary.
m_signaled
can be updated by another thread that runs on another worker (=Pthreads), but in my understanding, the C standard does not guarantee when a reader can see m_signaled
updated by a writer without atomic operations.
Since this waiter
object is allocated in a function stack, it can cause SEGV if the writer accesses this object after m_signaled
is read by a reader. It should not happen in opal_cond_signal()
if all the memory write operations are issued in the order written in the code.
If acquire-release or seq-cst load/store can remove this lock, but it makes the code a bit more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see what you mean. The problem is that you are issuing a lot of atomic memory operations, esp. if qthread_yield
is a no-op (i.e., if there are no other qthreads available for execution). In that short snippet, it's four atomic memory ops already. If you have multiple threads (not qthreads) waiting on this conditional variable these locks will be quite contended. Plus, opal_mutex_unlock
calls qthread_yield
again (which I am also not convinced is necessary).
Anyway, what is the lock
argument in this function good for? It doesn't seem to protect anything so it can be re-taken right before exiting the function (the cond
has its own lock). If you make waiter.m_signaled
volatile
(to prevent the compiler from optimizing away the loads) and insert a memory barrier (opal_atomic_wmb()
should do) in the signalling code before setting m_signaled
then the tail of the function could look like this:
while (1) {
qthread_yield();
/* Check if someone woke me up. */
if (waiter.m_signaled) {
break;
}
}
opal_mutex_lock(lock);
return OPAL_SUCCESS;
Yes, the signalling will become a bit more expensive due to the memory barrier but having several threads hammering on atomic locks is by far more problematic than a few more cycles spent by one thread in the signalling code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this type of polling is probably not the best use of Qthreads. Using FEBs instead would make for an event-driven approach. Not sure if its best to leave it for now and reimplement later or go ahead and fix now. @janciesko could look at doing that as time permits. Meanwhile we are grateful to @shintaro-iwasaki for getting at least something in there.
]) | ||
|
||
AS_IF([test $opal_qthreads_happy = yes], | ||
[OPAL_CHECK_PACKAGE([opal_qthreads], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this qthread integration relies on the latest master it should probably check for the correct version/symbols being available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It relies on the latest master, but this change cannot be observed at a C level. Specifically, the latest Qthreads changes the way of "#include", which is hard to detect programmatically. Symbol/version check is promising once Qthreads has an official release that includes this update.
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is OK. The performance issues in the conditional variable implementation is unresolved but that is not a hill I will die on.
@jsquyres Could you please take a look at this PR? Is the issue of |
The Qthreads team can spend some time working out a better performing solution in the near future. The changes by Shintaro at least give us something correct for now. |
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
I'd like to thank @devreal and @jsquyres for reviewing this PR! Could someone merge this PR? Since this incomplete MCA/thread/Qthread has been in the main branch of Open MPI, we'd be extremely happy if this PR is merged before the upcoming 5.0 release. (Basically this PR does not affect the existing Pthreads implementation.) |
Done! Thanks for all the work on this! |
This PR fixes #8036. Specifically, this PR provides the following:
--with-qthreads
(the same as Argobots, which has been already added by rework argobots configury to be smarter #7675. I'd like to thank @hppritcha )This PR has been developed together with the Qthreads developers (@olivier-snl and @janciesko). I checked it with the latest Qthreads (you need the latest master branch).
Note that this PR does not affect the behavior of Open MPI by default; this change is effective only when one enables Qthreads.