Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opal/mca/threads/qthreads: Fix #8036 #8053

Merged
merged 4 commits into from
Oct 8, 2020
Merged

opal/mca/threads/qthreads: Fix #8036 #8053

merged 4 commits into from
Oct 8, 2020

Conversation

shintaro-iwasaki
Copy link
Contributor

This PR fixes #8036. Specifically, this PR provides the following:

  1. Add --with-qthreads (the same as Argobots, which has been already added by rework argobots configury to be smarter #7675. I'd like to thank @hppritcha )
  2. Fix Qthreads compilation issues reported by @devreal in Argobots and Qthread configure detection need fixing/improvements #8036
  3. Implement several not-implemented features for MCA/threads/Qthreads backend.

This PR has been developed together with the Qthreads developers (@olivier-snl and @janciesko). I checked it with the latest Qthreads (you need the latest master branch).

Note that this PR does not affect the behavior of Open MPI by default; this change is effective only when one enables Qthreads.

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on that 👍 Some small comments inline

opal_threads_ensure_init_qthreads();
opal_thread_t *t = OBJ_NEW(opal_thread_t);
t->t_thread_ret_ptr = opal_thread_get_qthreads_self();
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return NULL;
return t;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing. I fixed it.

/* Check if someone woke me up. */
opal_atomic_lock(&cond->m_lock);
int signaled = waiter.m_signaled;
opal_atomic_unlock(&cond->m_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking that lock seems unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock (or any memory barrier that can order accesses to m_signaled) is necessary.
m_signaled can be updated by another thread that runs on another worker (=Pthreads), but in my understanding, the C standard does not guarantee when a reader can see m_signaled updated by a writer without atomic operations.

Since this waiter object is allocated in a function stack, it can cause SEGV if the writer accesses this object after m_signaled is read by a reader. It should not happen in opal_cond_signal() if all the memory write operations are issued in the order written in the code.

If acquire-release or seq-cst load/store can remove this lock, but it makes the code a bit more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see what you mean. The problem is that you are issuing a lot of atomic memory operations, esp. if qthread_yield is a no-op (i.e., if there are no other qthreads available for execution). In that short snippet, it's four atomic memory ops already. If you have multiple threads (not qthreads) waiting on this conditional variable these locks will be quite contended. Plus, opal_mutex_unlock calls qthread_yield again (which I am also not convinced is necessary).

Anyway, what is the lock argument in this function good for? It doesn't seem to protect anything so it can be re-taken right before exiting the function (the cond has its own lock). If you make waiter.m_signaled volatile (to prevent the compiler from optimizing away the loads) and insert a memory barrier (opal_atomic_wmb() should do) in the signalling code before setting m_signaled then the tail of the function could look like this:

    while (1) {
        qthread_yield();
        /* Check if someone woke me up. */
        if (waiter.m_signaled) {
            break;
        }
    }
    opal_mutex_lock(lock);
    return OPAL_SUCCESS;

Yes, the signalling will become a bit more expensive due to the memory barrier but having several threads hammering on atomic locks is by far more problematic than a few more cycles spent by one thread in the signalling code.

Copy link

@olivier-snl olivier-snl Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this type of polling is probably not the best use of Qthreads. Using FEBs instead would make for an event-driven approach. Not sure if its best to leave it for now and reimplement later or go ahead and fix now. @janciesko could look at doing that as time permits. Meanwhile we are grateful to @shintaro-iwasaki for getting at least something in there.

])

AS_IF([test $opal_qthreads_happy = yes],
[OPAL_CHECK_PACKAGE([opal_qthreads],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this qthread integration relies on the latest master it should probably check for the correct version/symbols being available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It relies on the latest master, but this change cannot be observed at a C level. Specifically, the latest Qthreads changes the way of "#include", which is hard to detect programmatically. Symbol/version check is promising once Qthreads has an official release that includes this update.

@jsquyres
Copy link
Member

ok to test

opal/mca/threads/qthreads/Makefile.am Outdated Show resolved Hide resolved
opal/mca/threads/qthreads/Makefile.am Outdated Show resolved Hide resolved
opal/mca/threads/qthreads/configure.m4 Outdated Show resolved Hide resolved
opal/mca/threads/qthreads/configure.m4 Outdated Show resolved Hide resolved
opal/mca/threads/qthreads/threads_qthreads_module.c Outdated Show resolved Hide resolved
@shintaro-iwasaki
Copy link
Contributor Author

I have updated the commits to address problems pointed out by @devreal and @jsquyres. Could someone please review this PR again?

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is OK. The performance issues in the conditional variable implementation is unresolved but that is not a hill I will die on.

@shintaro-iwasaki
Copy link
Contributor Author

shintaro-iwasaki commented Oct 5, 2020

@jsquyres Could you please take a look at this PR? Is the issue of TPKG_* and THREAD_* blocking this PR?
If needed, I will rebase this PR.

@olivier-snl
Copy link

I think the PR is OK. The performance issues in the conditional variable implementation is unresolved but that is not a hill I will die on.

The Qthreads team can spend some time working out a better performing solution in the near future. The changes by Shintaro at least give us something correct for now.

Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
@shintaro-iwasaki
Copy link
Contributor Author

I'd like to thank @devreal and @jsquyres for reviewing this PR! Could someone merge this PR?

Since this incomplete MCA/thread/Qthread has been in the main branch of Open MPI, we'd be extremely happy if this PR is merged before the upcoming 5.0 release. (Basically this PR does not affect the existing Pthreads implementation.)
If this PR should be rebased, more tested, or checked by additional reviewers, please let me know.

@jsquyres jsquyres merged commit 0bcef04 into open-mpi:master Oct 8, 2020
@jsquyres
Copy link
Member

jsquyres commented Oct 8, 2020

Done! Thanks for all the work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Argobots and Qthread configure detection need fixing/improvements
5 participants