-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/efa: Use long CTS protocol if runting read protocol fails because of memory registration limits #9493
Conversation
1c4e49a
to
3774491
Compare
3774491
to
22e50bc
Compare
/* The data_offset will be non-zero when the long CTS RTM packet | ||
* is sent to continue a runting read transfer after the | ||
* receiver has run out of memory registrations */ | ||
assert((data_offset == 0 || ope->internal_flags & EFA_RDM_OPE_READ_NACK) && data_size == -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for runting read, receiver already gets some data before try to register the memory. If we fallback to long cts, we restart from the data_offset. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct
send any data with the RTM packet. This is because the runting read RTM | ||
packets have already delivered some of the data and the long CTS RTM | ||
packet does not have a seg_offset field */ | ||
if (txe->internal_flags & EFA_RDM_OPE_READ_NACK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is because long cts rtm currently doesn't support sending from a non-zero offset. data pkts supports offset so it is not impacted. Do I understand correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly
22e50bc
to
248ed7f
Compare
@wzamazon could you have a look at it? |
FI_ENOMR is returned when the hardware memory registration limit is reached Signed-off-by: Sai Sunku <sunkusa@amazon.com>
The READ_NACK feature is checked before sending a EFA_RDM_READ_NACK_PKT packet. The EFA_RDM_READ_NACK_PKT packet is sent by a receiver when it fails to register a buffer to receive the RDMA read data in a long read or runting read protocol Signed-off-by: Sai Sunku <sunkusa@amazon.com>
…ENOMR Long read protocol could fail with ENOMR if the EFA provider is unable to register the buffer with the NIC. In that case, we should fall back to long CTS instead This commit is for the changes when the sender fails to register the source buffer. The sender will switch to the long CTS protocol. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
This change is required for the long read nack protocol where we get the msg_id from ope instead of from the pke Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Long read protocol could fail with ENOMR if the EFA provider is unable to register the buffer with the NIC. In that case, we should fall back to long CTS protocol. This commit is for the changes when the receiver fails to register the destination memory. Receiver sends a NACK packet (packet type EFA_RDM_READ_NACK_PKT) to the sender. The sender switches to the long CTS protocol. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
…ENOMR Runting read protocol could fail with ENOMR if the EFA provider is unable to register the buffer with the NIC. In that case, we should fall back to long CTS instead This commit is for the changes when the sender fails to register the source buffer. The sender will switch to the long CTS protocol. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Runting read protocol could fail with ENOMR if the EFA provider is unable to register the buffer with the NIC. In that case, we should fall back to long CTS protocol. This commit is for the changes when the receiver fails to register the destination memory. Receiver sends a NACK packet (packet type EFA_RDM_READ_NACK_PKT) to the sender. The sender switches to the long CTS protocol. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Signed-off-by: Sai Sunku <sunkusa@amazon.com>
248ed7f
to
592dd68
Compare
The runting read protocol can fail because of MR registration limits on the hardware. This PR has changes to switch to long CTS protocol when that happens.
The first four commits are in #9432