Skip to content

Commit

Permalink
prov/efa: Add read nack protocol docs
Browse files Browse the repository at this point in the history
Signed-off-by: Sai Sunku <sunkusa@amazon.com>
  • Loading branch information
sunkuamzn committed Nov 1, 2023
1 parent 5c0756d commit 248ed7f
Show file tree
Hide file tree
Showing 5 changed files with 271 additions and 0 deletions.
47 changes: 47 additions & 0 deletions prov/efa/docs/efa_rdm_protocol_v4.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ Table: 1.2 A list of packet type IDs
| 8 | `ATOMRSP` | ATOMic ResSPonse | non-REQ | emulated write/fetch/compare atomic |
| 9 | `HANDSHAKE` | Handshake | non-REQ | handshake |
| 10 | `RECEIPT` | Receipt | non-REQ | delivery complete (DC) |
| 11 | `READ_NACK` | Read Nack packet | non-REQ | Long read and runting read nack protocols |
| 64 | `EAGER_MSGRTM` | Eager non-tagged Request To Message | REQ | eager message |
| 65 | `EAGER_TAGRTM` | Eager tagged Request To Message | REQ | eager message |
| 66 | `MEDIUM_MSGRTM` | Medium non-tagged Request To Message | REQ | medium message |
Expand Down Expand Up @@ -320,6 +321,7 @@ Table: 2.1 a list of extra features/requests
| 3 | sender connection id in packet header | extra request | libfabric 1.14.0 | Section 4.4 |
| 4 | runting read message protocol | extra feature | libfabric 1.16.0 | Section 4.5 |
| 5 | RDMA-Write based data transfer | extra feature | libfabric 1.18.0 | Section 4.6 |
| 6 | Read nack packets | extra feature | libfabric 1.20.0 | Section 4.7 |

How does protocol v4 maintain backward compatibility when extra features/requests are introduced?

Expand Down Expand Up @@ -744,6 +746,8 @@ LONGCTS_RTM, CTS and CTSDATA.

A LONGCTS_RTM packet, like any REQ packet, consists of 3 parts:
LONGCTS RTM mandatory header, REQ optional header, and application data.
A LONGCTS_RTM packet sent as part of the read nack protcol (Section 4.7)
does not contain any application data.

The format of the LONGCTS_RTM mandatory header is listed in table 3.5:

Expand Down Expand Up @@ -1490,6 +1494,49 @@ in order to support CQ entry generation in case the sender uses
`FI_REMOTE_CQ_DATA`.


### 4.7 Long read and runting read nack protocol

Long read and runting read protocols in Libfabric 1.20 and above use a nack protocol
when the receiver is unable to register a memory region for the RDMA read operation.
Failure to register the memory region is typically because of a hardware limitation.

Table: 4.2 Format of the READ_NACK packet

| Name | Length (bytes) | Type | C language type | Notes |
|---|---|---|---|---|
| `type` | 1 | integer | `uint8_t` | part of base header |
| `version` | 1 | integer | `uint8_t` | part of base header|
| `flags` | 2 | integer | `uint16_t` | part of base header |
| `send_id` | 4 | integer | `uint32_t` | ID of the send operation |
| `recv_id` | 4 | integer | `uint32_t` | ID of the receive operation |
| `multiuse(connid/padding)` | 4 | integer | `uint32_t` | `connid` if CONNID_HDR is set, otherwise `padding` |

The nack protocols work as follows
* Sender has decided to use the long read or runting read protocol
* The receiver receives the RTM packet(s)
- One LONGREAD_RTM packet in case of long read protocol
- Multiple RUNTREAD_RTM packets in case of runting read protocol
* The receiver attempts to register a memory region for the RDMA operation but fails
* After all RTM packets have been processed, the receiver sends a READ_NACK packet to the sender
* The sender then switches to the long CTS protocol and sends a LONGCTS_RTM packet
* The receiver sends a CTS packet and the data transfer continues as in the long CTS protocol

The LONGCTS_RTM packet sent in the nack protocol does not contain any application data.
This difference is because the LONGCTS_RTM packet does not have a `seg_offset` field.
While the LONGREAD_RTM packet does not contain any application data, the RUNTREAD_RTM
packets do. So if the LONGCTS_RTM data were to contain application data, it must have a
non-zero `seg_offset` to account for the data sent in the RUNTREAD_RTM packets. Instead
of introducing a `seg_offset` field to LONGCTS_RTM packet, the nack protcol simply
doesn't send any data in the LONGCTS_RTM packet.

The workflow for long read protocol is shown below

![long-read fallback](message_longread_fallback.png)

The workflow for runting read protocol is shown below

![long-read fallback](message_runtread_fallback.png)

## 5. What's not covered?

The purpose of this document is to define the communication protocol. Therefore, it is intentionally written
Expand Down
103 changes: 103 additions & 0 deletions prov/efa/docs/message_longread_fallback.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
<mxfile host="drawio.corp.amazon.com" modified="2023-10-31T21:17:30.337Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/115.0" etag="jqDsJ0tKN40itrkzNSJk" version="21.7.4" type="device">
<diagram id="APAEDZxGAzosg-hluIWG" name="Page-1">
<mxGraphModel dx="899" dy="665" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="hM9hUbB8x_-XiU8bxfhU-2" value="Application call libfabric&#39;s send API" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="20" y="20" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-3" value="Receiver&#39;s progress engine" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="220" y="20" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-6" value="" style="endArrow=classic;html=1;dashed=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" target="hM9hUbB8x_-XiU8bxfhU-9" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="79.5" y="80" as="sourcePoint" />
<mxPoint x="80" y="250" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-7" value="" style="endArrow=classic;html=1;dashed=1;" parent="1" target="hM9hUbB8x_-XiU8bxfhU-10" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="279.5" y="80" as="sourcePoint" />
<mxPoint x="280" y="259" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-8" value="" style="endArrow=classic;html=1;dashed=1;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="80" y="120" as="sourcePoint" />
<mxPoint x="280" y="160" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-9" value="Application got libfabric completion" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="20" y="510" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-10" value="Application got libfabric&#39;s completion" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="214" y="510" width="130" height="60" as="geometry" />
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-14" value="READ_RTM" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;rotation=11;" parent="1" vertex="1">
<mxGeometry x="160" y="121" width="40" height="20" as="geometry" />
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-43" value="" style="endArrow=classic;html=1;dashed=1;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="280" y="220" as="sourcePoint" />
<mxPoint x="80" y="260" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="hM9hUbB8x_-XiU8bxfhU-44" value="READ_NACK" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;rotation=349;" parent="1" vertex="1">
<mxGeometry x="160" y="221" width="40" height="20" as="geometry" />
</mxCell>
<mxCell id="ODspx_lB6T_whGsfVqag-5" value="Application call libfabric&#39;s receive API" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" parent="1" vertex="1">
<mxGeometry x="280" y="150" width="150" height="20" as="geometry" />
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-5" value="" style="endArrow=classic;html=1;dashed=1;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="80" y="280" as="sourcePoint" />
<mxPoint x="280" y="320" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-8" value="LONGCTS RTM" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;rotation=11;" vertex="1" parent="1">
<mxGeometry x="134.81999999999994" y="280" width="100.18" height="20" as="geometry" />
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-9" value="" style="endArrow=classic;html=1;dashed=1;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="280" y="330" as="sourcePoint" />
<mxPoint x="80" y="370" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-10" value="CTS" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;rotation=349;" vertex="1" parent="1">
<mxGeometry x="158" y="336" width="40" height="10" as="geometry" />
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-11" value="DATA" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;rotation=11;" vertex="1" parent="1">
<mxGeometry x="125" y="380" width="110" height="20" as="geometry" />
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-12" value="" style="endArrow=classic;html=1;dashed=1;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="80" y="380" as="sourcePoint" />
<mxPoint x="280" y="420" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-13" value="" style="endArrow=classic;html=1;dashed=1;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="80" y="400" as="sourcePoint" />
<mxPoint x="280" y="440" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-14" value="" style="endArrow=classic;html=1;dashed=1;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="80" y="420" as="sourcePoint" />
<mxPoint x="280" y="460" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-15" value="" style="endArrow=classic;html=1;dashed=1;" edge="1" parent="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="80" y="440" as="sourcePoint" />
<mxPoint x="280" y="480" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="7KGTspXBKm9bkcGoGxPe-47" value="Memory registration failed on the receiver" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;" vertex="1" parent="1">
<mxGeometry x="280" y="200" width="150" height="20" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
Binary file added prov/efa/docs/message_longread_fallback.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 248ed7f

Please sign in to comment.