Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v5.0.x: OSC backports #10034

Merged
merged 17 commits into from
Mar 5, 2022

Conversation

bwbarrett
Copy link
Member

Backports of #9975 and #9991.

The put over get implementation passed the wrong pointer to the
cbdata of the get call, resulting in all kinds of badness when
the get completed.  This patch passes through the expected pointer.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 8fedcc0)
@bwbarrett bwbarrett marked this pull request as draft February 22, 2022 18:38
Clean up language around btl selection phases to match rest of the
code (accelerated/alternate).

Also switch component initialization code to use opal_output_verbose
rather than OSC_RDMA_VERBOSE, so that the debugging prints can
be enabled on a non-debug build.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit b6b16a6)
See lengthy comment in the change, but this patch removes the
ability of users to specify a subset of available btls for use
by the osc rdma component.  The BTL interface was never designed
for such usage (which is why there is no similar option for the
OB1 PML) and it had clear places where it broke, so remove it.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 61aca9e)
With the change to load all btls in the alternate case and the
removal of the logic to change priority based on alternate or
accelerated btls, there's no point in running the full btl
query in osc_rdma_component_query().  Instead, just verify
that there is some number of BTLs available, which is the
extent of testing in the alternate case.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit e93f041)
Refactor wrapper calls around the btl atomic code into their own
header, removing them from the lock interface header.  Clean up
some resulting header dependency changes.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 0394543)
Add wrappers to call all BTL RDMA functions.  This commit just adds
another layer of indirection to the communication calls.  However,
a follow-on patch will add logic to either call the BTL RDMA
functions directly (for accelerated mode) or call the BTL AM RDMA
compatibility layer for alternate mode.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 6a15883)
Split the storage of btls based on whether the window is using
accelerated or alternate btls.  This makes it more obvious when
the code has made assumptions about mode that may not be true
(such as the memory registration calls throughout the code that
assumed selected_btl[0] was the one true BTL).

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 75e36e5)
ompi_osc_rdma_btl_op() already includes a check to emulate a remote
op with a remote fetch-and-op, so there's no need for a second check
in ompi_osc_rdma_acc_single_atomic().

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 222b7f3)
Use the opal_min() function instead of a custom macro.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 6e2af18)
The use_cpu_atomics member of the rdma module was only used during
initialization, so there was no reason for it to be a module
member.

Also clean up the initialization logic for the field (and therefore,
whether or not the shared state optimization is used).  Previously,
it appeared to be possible to use with alternate btls across multiple
nodes (ie, we tested the GLOB value on the btls), but the
use_cpu_atomics field was initialized to false in the multi-node case
and so they would never actually be enabled.  Make that more obvious,
rather than try to enable the optimization, to reduce the risk of
introducing new datapath bugs.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 94fed3b)
Switch from using the implicit BTL interface (where the am-rdma
interface just extends missing functionality in the BTL) to the
new explicit interface (where the OSC RDMA interface is the
only maintainer of the BTL list.

With this change, alternate BTLs do not have to support
REMOTE_COMPLETION to be selected (because the AM RDMA interface
always provides remote completion when we request it, as this
patch does).  Any BTL that supports Active Messages (ie, all of
them) should be able to support the OSC RDMA required semantics,
eliminating the problem of creating windows with no servicable
BTLs.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 4080d5d)
The RDMA OSC component is the "general" component, similar to OB1.
The OSC subsystem has gone through some priority inflation over
the years, so try to clean that up. This patch lowers the RDMA
OSC component priority from 101 to 20, leaving the priorities as:

  sm      100
  ucx     60 (if API version > 1.5.0)
  portals 20
  rdma    20

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 5246b66)
Fix an assignment of request onto itself, which was a harmless typo,
but was responsible for Coverity issues CID 1429865 and CID 1429864.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit ad9fae9)
For some reason, the SM OSC component was one of the few components
which did not have an MCA parameter to set priority.  This patch
fixes that.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 1112744)
Fix WIN_SHARED initialization and error reporting by pushing the actual
generation of MPI_ERR_RMA_SHARED errors until component selection (from
component query).  Components that don't do FLAVOR_SHARED should just
return -1 for query.  SM, since it should always work for FLAVOR_SHARED,
will return a priority for component_query() for any FLAVOR_SHARED
window, then will generate the MPI_ERR_RMA_SHARED error during
component_select.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit efec44c)
Fix the bool interpretation, which had a bug that made it impossible
to interpret the strings "false" or "no" properly.

Fix issues uncovered with handling overflow and underflow of the int
type when writing the test.

Add a test for the opal_cstring interface.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 4f42204)
In a previous commit, I accidently set the priority to always
be 0.  Which is wrong, but had no practical impact, given
the priorities and availabilities of other OSC components.
However, it was wrong and this patch fixes the issue.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit cf253bc)
@bwbarrett bwbarrett force-pushed the backports/all-the-onesided-fixes branch from f8c28fc to 3347767 Compare March 1, 2022 23:06
@open-mpi open-mpi deleted a comment from ibm-ompi Mar 1, 2022
@open-mpi open-mpi deleted a comment from ibm-ompi Mar 1, 2022
@open-mpi open-mpi deleted a comment from ibm-ompi Mar 1, 2022
@bwbarrett bwbarrett marked this pull request as ready for review March 2, 2022 17:00
@bwbarrett bwbarrett added this to the v5.0.0 milestone Mar 2, 2022
@awlauria awlauria merged commit 36b9ed0 into open-mpi:v5.0.x Mar 5, 2022
@bwbarrett bwbarrett deleted the backports/all-the-onesided-fixes branch March 25, 2022 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants