Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.20.x] prov/efa: handshake cleanup + bugfixes #9697

Merged

Conversation

a-szegel
Copy link
Contributor

@a-szegel a-szegel commented Jan 4, 2024

No description provided.

Pass a pointer to the peer instead of the peer's address in
efa_rdm_ep_trigger_handshake().  This lets us avoid doing redundant peer
lookups.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit 56564ae)
Prevously, efa_rdm_txe_construct() was not initializing rma_iov_count to
0 when creating a txe. This value needs to be initialized to 0 b/c the
OPE buff pool is shared between the fi_send()/fi_rma() path, and if
there is garbage in that value, than our common recv utilities can end
up going down the wrong path.  This bug was exposed when we switched the
efa_rdm_ep_trigger_handshake() code to use efa_rdm_ep_alloc_txe() with
IntelMPI osu_mbw_mr collective test.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit a11c06e)
Switch the handshake code to re-use the efa utility
efa_rdm_ep_alloc_txe() to alloc the txe used in the handshake.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit 17663a2)
Users can set FI_EFA_ENABLE_SHM_TRANSFER=0 in their environment, and
then they do not have a SHM EP.  The unit tests in their current state
fail if the env is set, but this patch fixes that.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit ada8ab4)
In order to optimize the fast path send path, we need the handshake
response code to create a txe. Now all sends will have a txe with a
valid peer so there is no need to do the lookup multiple times.

Fixes a bug in the error path to report issue to CQ instead of EQ.

Updates EFA Unit test to make the corresponding changes pass.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit 4ece628)
Now that all handshake requests, and responses use a txe, we can rely on
all send packets having a txe and we no longer need to look up the peer
in the send path.  This is an optimization that will make all EFA sends
faster.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit 3998d14)
Since all EFA send operations use a txe, we can remove the peer lookup
from the critical path offering a performance improvement to all EFA
sends.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
(cherry picked from commit ff8478a)
@a-szegel a-szegel requested a review from a team January 4, 2024 19:12
@a-szegel a-szegel merged commit 06fcf83 into ofiwg:v1.20.x Jan 4, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants