Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Queue txes when handshake is enforced but not made #10115

Merged
merged 6 commits into from
Jun 27, 2024

Conversation

shijin-aws
Copy link
Contributor

Currently, when a handshake is required but not made for a request to a given peer, we always return EAGAIN, which caused nasty user experience during the startup stage. This PR contains a series of commits that allows queuing the requests (when it's not for inject), and return 0 in such situation.

@shijin-aws shijin-aws requested a review from a team June 24, 2024 23:46
@shijin-aws shijin-aws force-pushed the queue_txes_before_handshake branch from 29c3a6b to e413982 Compare June 25, 2024 20:32
@shijin-aws
Copy link
Contributor Author

Latest push fixed unit test failure and added new unit tests to cover the changes

@shijin-aws shijin-aws force-pushed the queue_txes_before_handshake branch 2 times, most recently from 69c0ec3 to e311af6 Compare June 26, 2024 18:02
@zachdworkin
Copy link
Contributor

@shijin-aws PR had the ucx rdm_tagged_peek test failure which was removed from #10124. If you rebase and re-push it will fix it.

This function performs duplicate iteration over ope_queued_list.
This is an oversight from 3cfc0bb, which merged the queued_rnr_list
and the queued_ctrl_list. This patch fixes this bug by making
the function only flush these two kinds of queued opes.

Signed-off-by: Shi Jin <sjina@amazon.com>
…dshake is made

Signed-off-by: Shi Jin <sjina@amazon.com>
Introduce efa_rdm_txe_enforce_handshake() function to handle
the handshake triggering and the txe queueing.

Signed-off-by: Shi Jin <sjina@amazon.com>
With a single ope_queued_list, we can remove queued_entry
from the list equipotentially by checking if ope has
any queued flags.

Signed-off-by: Shi Jin <sjina@amazon.com>
When efa_rdm_ope_post_read has error, we should still continue
after writing the txe/rxe error.

When ctsdata pkt post returns FI_EAGAIN, we should still continue
instead of break because opes may come from different eps.

Signed-off-by: Shi Jin <sjina@amazon.com>
@shijin-aws shijin-aws force-pushed the queue_txes_before_handshake branch from ec584b2 to d56053d Compare June 26, 2024 22:19
@shijin-aws shijin-aws merged commit 71e0ef3 into ofiwg:main Jun 27, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants