Published: 2018-06-28
Last updated: 2018-06-28
NOTE: This document might be outdated, please consult Section 3.2.1 Handshake and Section 4.3 Handshake Messages of the SRT RFC additionally.
Contents
- Overview
- Short Introduction to SRT Packet Structure
- Handshake Structure
- The "UDT Legacy" and "SRT Extended" Handshakes
- The Caller-Listener Handshake
- The Rendezvous Handshake
- The SRT Extended Handshake
SRT is a connection protocol, and as such it embraces the concepts of "connection" and "session". The UDP system protocol is used by SRT for sending data as well as special control packets, also referred to as "commands".
An SRT connection is characterized by the fact that it is:
- first engaged by a handshake process
- maintained as long as any packets are being exchanged in a timely manner
- considered closed when a party receives the appropriate close command from its peer (connection closed by the foreign host), or when it receives no packets at all for some predefined time (connection broken on timeout).
Just like its predecessor UDT, SRT supports two connection configurations:
- Caller-Listener, where one side waits for the other to initiate a connection
- Rendezvous, where both sides attempt to initiate a connection
As SRT development has evolved, two handshaking mechanisms have emerged:
- the legacy UDT handshake, with the "SRT" part of the handshake implemented as extended control messages; this is the only mechanism in SRT versions 1.2 and lower, and is known as HSv4 (where the number 4 refers to the last UDT version)
- the new integrated handshake, known as HSv5, where all the required information concerning the connection is interchanged completely in the handshake process
The version compatibility requirements are such that if one side of the
connection only understands HSv4, the connection is made according to HSv4
rules. Otherwise, if both sides are at SRT version 1.3.0 or greater, HSv5 is
used. As the new handshake supports several features that might be mandatory
for a particular application, it is also possible to reject an HSv4-to-HSv5
connection by setting the SRTO_MINVERSION
socket option. The value for this
option is an integer with the version encoded in hex. For example:
int req_version = 0x00010300; // 1.3.0
srt_setsockflag(s, SRTO_MINVERSION, &req_version, sizeof(int));
IMPORTANT: Your SRT application must do either of these two things:
- Be HSv4 compatible. In this case it must:
- NOT use any new features in 1.3.0 or higher (such as bidirectional transmission or Stream ID)
- ALWAYS set
SRTO_SENDER
to true on the sender side
- Require HSv5. If so, it must prevent connections to any older versions of SRT by setting the minimum version 1.3.0 as shown above.
Every UDP packet carrying SRT traffic contains an SRT header (immediately after the UDP header). In all versions, the SRT header contains four major 32-bit fields:
PH_SEQNO
PH_MSGNO
PH_TIMESTAMP
PH_ID
Their interpretation depends on the type of packet, of which there are two:
control packets and data packets, defined by the first bit in the PH_SEQNO
field.
Here, for example, is a representation of an SRT 1.3.0 data packet header (where the "packet type" bit = 0):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| Packet Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|FF |O|KK |R| Message Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Stamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
NOTE: Packet diagrams in this document are in network bit order.
While a complete description of a data packet is out of scope for this document, here is a description of some other header fields unique to SRT:
-
FF = (2 bits) Position of packet in message, where:
- 10b = 1st
- 00b = middle
- 01b = last
- 11b = single
-
O = (1 bit) Indicates whether the message should be delivered in order (1) or not (0). In File/Message mode (original UDT with UDT_DGRAM) when this bit is clear then a message that is sent later (but reassembled before an earlier message which may be incomplete due to packet loss) is allowed to be delivered immediately, without waiting for the earlier message to be completed. This is not used in Live mode because there's a completely different function used for data extraction when TSBPD mode is on.
-
KK = (2 bits) Indicates whether or not data is encrypted:
- 00b: not encrypted
- 01b: encrypted with even key
- 10b: encrypted with odd key
-
R = (1 bit) Retransmitted packet. This flag is clear (0) when a packet is transmitted the very first time, and is set (1) if the packet is retransmitted.
In Data packets, the third and fourth fields are interpreted as follows:
PH_TIMESTAMP
: Usually the time when a packet was sent, although the real interpretation may vary depending on the type, and it's not important for the handshakePH_ID
: The Destination Socket ID to which a packet should be dispatched, although it may have the special value 0 when the packet is a connection request
Additional details for Data packets will be discussed in the sections below covering extension flags.
An SRT control packet header ("packet type" bit = 1) has the following structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| Message Type | Message Extended Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Additional Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Stamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
For Control packets the first two fields are interpreted respectively (using network bit order) as:
PH_SEQNO
:- Bit 0: packet type (set to 1 for control packet)
- Bits 1-15: Message Type (see enum
UDTMessageType
) - Bits 16-31: Message Extended type
PH_MSGNO
: Additional data
The type subfields (in the PH_SEQNO
field) are used in two ways:
- The Message Type (
SEQNO_MSGTYPE
) is one of the values enumerated asUDTMessageType
, exceptUMSG_EXT
. In this case, the type is determined by this value only, and the Message Extended Type (SEQNO_EXTTYPE
) value should always be 0. - The Message Type is
UMSG_EXT
. In this case the actual message type is contained in the Message Extended Type.
The Extended Message mechanism is theoretically open for further extensions. SRT uses some of them for its own purposes. This will be referred to later in the section on the SRT Extended Handshake.
The Additional Data
field (PH_MSGNO
) is used in some control messages as
extra space for data. Its interpretation depends on the particular message type.
Handshake messages don't use it.
The handshake portion of a control packet, which comes immediately after the UDT header and SRT header, consists of the following 32-bit fields in order:
Field | Description |
---|---|
Version |
Contains number 4 in this version. |
Type |
In SRT versions up to 1.2.0 (HSv4) must be the value of UDT_DGRAM , which is 2. For usage in later versions of SRT see the "Type field" section below. |
ISN |
Initial Sequence Number; the sequence number for the first data packet |
MSS |
Maximum Segment Size, which is typically 1500, but can be less |
FlightFlagSize |
Maximum number of buffers allowed to be "in flight" (sent and not ACK-ed) |
ReqType |
Request type (see below) |
ID |
The SOURCE socket ID from which the message is issued (target is in SRT header) |
Cookie |
Cookie used for various processing (see below) |
PeerIP |
Placeholder for the sender's IPv4 or IPv6 IP address, consisting of four 32-bit fields |
Here is a representation of the HSv4 handshake structure (which follows immediately after the SRT control packet header):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UDT Version {4} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Socket Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Initial Packet Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Maximum Packet Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Maximum Flow Window Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Connection Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SYN Cookie |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Peer IP Address |
| |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
And here is the equivalent portion of the HSv5 handshake structure (to simplify the comparison here, the extended portion of the HSv5 handshake structure is not shown. See the "UDT Legacy" and "SRT Extended" Handshakes section for details):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UDT Version {5} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Encryption Flags | Extension Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Initial Packet Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Maximum Packet Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Maximum Flow Window Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Connection Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SYN Cookie |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Peer IP Address |
| |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The HSv4 (UDT-legacy based) handshake is based on two rules:
-
The complete handshake process, which establishes the connection, is the same as the UDT handshake.
-
The required SRT data interchange is done after the connection is established using SRT Extended Message with the following Extended Types:
SRT_CMD_HSREQ
/SRT_CMD_HSRSP
, which exchange special SRT flags as well as a latency valueSRT_CMD_KMREQ
/SRT_CMD_KMRSP
(optional), which exchange the wrapped stream encryption key used with encryption (KMRSP
is used only for confirmation or error reporting)
IMPORTANT: There are two rules in the UDT code that continue to apply to SRT version 1.2.0 and earlier, and therefore affect the prerequisites for any future versions of the protocol:
-
The initial handshake response message coming from the Listener side DOES NOT REWRITE the
Version
field (it's simply blindly copied from the handshake request message received). -
The size of the handshake message must be exactly equal to the legacy UDT handshake structure, otherwise the message is silently rejected.
As of SRT version 1.3.0 with HSv5 the handshake must only satisfy the minimum size. However, the code cannot rely on this until each peer is certain about the SRT version of the other.
Even in HSv5, the Caller must first set two fields in the initial handshake message:
Version
= 4Type
=UDT_DGRAM
The version recognition relies on the fact that the Listener returns a
version of 5 (or potentially higher) if it is capable, but the Caller must
set the Version
to 4 to make sure that the Listener copies this value, which
is how an HSv4 client is recognized. This allows SRT to handle the following
combinations:
-
HSv5 Caller vs. HSv4 Listener: The Listener returns version 4 to the Caller, so the Caller knows it should use HSv4, and then continues the handshake the old way.
-
HSv4 Caller vs. HSv5 Listener: The Caller sends version 4 and the Listener returns version 5. The Caller ignores this value, however, and sends the second phase of the handshake still using version 4. This is how the Listener recognizes the HSv4 client.
-
Both HSv5: The Listener responds with version 5 (or potentially higher in future) and the HSv5 Caller recognizes this value as HSv5 (or higher). The Caller then initiates the second phase of the handshake according to HSv5 rules.
With Rendezvous there's no problem because both sides try to
connect to one another, so there's no copying of the handshake data. Each
side crafts its own handshake individually. If the value of the Version
field is 5 from the very beginning, and if there are any extension flags set in
the Type
field (see note below), the rules of HSv5 apply. But if one party is
using version 4, the handshake continues as HSv4.
NOTE: Previously, the Type
field contained only the extension flags, but
now it also contains the encryption flag. So for HSv5 rules to apply the
extension flag needs to be expressly set.
The first versions of SRT did not change anything in the UDT handshake mechanisms, which are identified as HSv4. Here the connection process is the same as it was in UDT, and any extended SRT handshake operations are done after the HSv4 handshake is established.
The HSv5 handshake was first introduced in SRT version 1.3.0. It includes all the extended SRT handshake operations in the overall handshake process (known as "integrated handshake"), which means that these data are considered exchanged and agreed upon at the moment when the connection is established.
The addition of a new handshake mechanism necessitates the introduction of two new roles: "Initiator" and "Responder":
-
Initiator: Starts the extended SRT handshake process and sends appropriate SRT extended handshake requests
-
Responder: Expects the SRT extended handshake requests to be sent by the Initiator and sends SRT extended handshake responses back
There are two basic types of SRT handshake extensions that are exchanged in both handshake versions (HSv5 introduces some more extensions):
SRT_CMD_HSREQ
: Exchanges the basic SRT informationSRT_CMD_KMREQ
: Exchanges the wrapped stream encryption key (used only if encryption is requested)
The Initiator and Responder roles are assigned differently in HSv4 and HSv5.
For an HSv4 handshake the assignments are simple:
- Initiator is the sender, which is the party that has set the
SRTO_SENDER
socket option to true. - Responder is the receiver, which is the party that has set
SRTO_SENDER
to false (default).
Note that these roles are independent of the connection mode
(Caller/Listener/Rendezvous), and that the behavior is undefined if SRTO_SENDER
has the same value on both parties.
For an HSv5 handshake, the roles are dependent of the connection mode:
-
For Caller-Listener connections:
- the Caller is the Initiator
- the Listener is the Responder
-
For Rendezvous connections:
- The Initiator and Responder roles are assigned based on the initial data interchange during the handshake (see The Rendezvous Handshake below)
Note that if the handshake can be done as HSv5, the connection is always
considered bidirectional and the SRTO_SENDER
flag is unused.
The ReqType
field in the Handshake Structure (see above)
indicates the handshake message type.
Caller-Listener Request Types:
- Caller to Listener:
URQ_INDUCTION
- Listener to Caller:
URQ_INDUCTION
(reports cookie) - Caller to Listener:
URQ_CONCLUSION
(uses previously returned cookie) - Listener to Caller:
URQ_CONCLUSION
(confirms connection established)
Rendezvous Request Types:
- After starting the connection:
URQ_WAVEAHAND
- After receiving the above message from the peer:
URQ_CONCLUSION
- After receiving the above message from the peer:
URQ_AGREEMENT
.
Note that the Rendezvous process is different in HSv4 and HSv5, as the latter is based on a state machine.
In case when the connection process has failed when the party was about to
send the URQ_CONCLUSION
handshake, this field will contain appropriate
error value. This value starts from 1000 (see UDTRequestType
in handshake.h
,
since URQ_FAILURE_TYPES
symbol) added with the value of the rejection
reason (see SRT_REJECT_REASON
in srt.h
).
There are two possible interpretations of the Type
field. The first is the
legacy UDT "socket type", of which there are two: UDT_STREAM
and UDT_DGRAM
(in SRT only UDT_DGRAM
is allowed). This legacy interpretation is applied in
the following circumstances:
- in an
URQ_INDUCTION
message sent initially by the Caller - in an
URQ_INDUCTION
message sent back by the HSv4 Listener - in an
URQ_CONCLUSION
message, if the other party was detected as HSv4
For more information on Induction and Conclusion see the Caller-Listener Handshake section below.
UDT interpreted the Type
field as either a Stream or Message type,
and rejected the connection if the parties each used a different type. Since SRT
only uses the Message type, HSv5 uses only the UDT_DGRAM
value for this field
in cases where the message is going to be sent to an HSv4 party (which follows
the UDT interpretation).
In all other cases Type
follows the HSv5 interpretation and consists of the
following:
- an upper 16-bit field (0 - 15) reserved for encryption flags
- a lower 16-bit field (16 - 31) reserved for extension flags
The extension flags field should have the following value:
- in a
URQ_CONCLUSION
message, it should contain a combination of extension flags (with theHS_EXT_
prefix) - in a
URQ_INDUCTION
message sent back by the Listener it should containSrtHSRequest::SRT_MAGIC_CODE
(0x4A17) - in all other cases it should be 0.
The encryption flags currently occupy only 3 out of 16 bits, which are used
to advertise a value for PBKEYLEN
(packet based key length). This value is taken
from the SRTO_PBKEYLEN
option, divided by 8, giving possible values of:
- 2 (AES-128)
- 3 (AES-192)
- 4 (AES-256)
- 0 (PBKEYLEN not advertised)
The PBKEYLEN
advertisement is required due to the fact that while the Sender
should decide the PBKEYLEN
, in HSv5 the Sender might be the Responder. Therefore
PBKEYLEN
is advertised to the Initiator so that it gets this value before it
starts creating the SEK on its side, to be then sent to the Responder.
REMINDER: Initiator and Responder roles are assigned differently in HSv4 and HSv5. See the Initiator and Responder section above.
The specification of PBKEYLEN
is decided by the Sender. When the transmission
is bidirectional, this value must be agreed upon at the outset because when both
are set, the Responder wins. For Caller-Listener connections it is reasonable to
set this value on the Listener only. In the case of Rendezvous the only reasonable
approach is to decide upon the correct value from the different sources and to
set it on both parties (note that AES-128 is the default).
This section describes the handshaking process where a Listener is
waiting for an incoming packet on a bound UDP port, which should be an SRT
handshake command (UMSG_HANDSHAKE
) from a Caller. The process has two phases:
induction and conclusion.
The Caller begins by sending an "induction" message, which contains the following (significant) fields:
- Version: must always be 4
- Type:
UDT_DGRAM
(2) - ReqType:
URQ_INDUCTION
- ID: Socket ID of the Caller
- Cookie: 0
The Destination Socket ID (in the SRT header) in this message is 0, which is interpreted as a connection request.
NOTE: This phase serves only to set a cookie on the Listener so that it doesn't allocate resources, thus mitigating a potential DOS attack that might be perpetrated by flooding the Listener with handshake commands.
An HSv4 Listener responds with exactly the same values, except:
- ID: Socket ID of the HSv4 Listener
- SYN Cookie: a cookie that is crafted based on host, port and current time with 1 minute accuracy
An HSv5 Listener responds with the following:
- Version: 5
- Type:
- Extension Field (lower 16 bits):
SrtHSRequest::SRT_MAGIC_CODE
- Encryption Field (upper 16 bits): Advertised
PBKEYLEN
- Extension Field (lower 16 bits):
- ReqType: (UDT Connection Type)
URQ_INDUCTION
- ID: Socket ID of the HSv5 Listener
- SYN Cookie: a cookie that is crafted based on host, port and current time with 1 minute accuracy
NOTE: The HSv5 Listener still doesn't know the version of the Caller, and it responds with the same set of values regardless of whether the Caller is version 4 or 5.
The important differences between HSv4 and HSv5 in this respect are:
-
The HSv4 party completely ignores the values reported in
Version
andType
. It is, however, interested in theCookie
value, as this must be passed to the next phase. It does interpret these fields, but only in the "conclusion" message. -
The HSv5 party does interpret the values in
Version
andType
. If it receives the value 5 inVersion
, it understands that it comes from an HSv5 party, so it knows that it should prepare the proper HSv5 messages in the next phase. It also checks the following in theType
field:-
whether the lower 16-bit field (extension flags) contains the magic value (see the Type Field section above); otherwise the connection is rejected. This is a contingency for the case where someone who, in attempting to extend UDT independently, increases the
Version
value to 5 and tries to test it against SRT. -
whether the upper 16-bit field (encryption flags) contain a non-zero value, which is interpreted as an advertised
PBKEYLEN
(in which case it is written into the value of theSRTO_PBKEYLEN
option).
-
Once the Caller gets its cookie, it sends a URQ_CONCLUSION
handshake
message to the Listener.
The following values are set by an HSv4 Caller. Note that the same values must
be used by an HSv5 Caller when the Listener has returned Version 4 in
its URQ_INDUCTION
response:
- Version: 4
- Type:
UDT_DGRAM
(SRT must have this legacy UDT socket type only) - ReqType:
URQ_CONCLUSION
- ID: Socket ID of the Caller
- Cookie: the cookie previously received in the induction phase
If an HSv5 Caller receives a confirmation from a Listener that it can use the version 5 handshake, it fills in the following values:
- Version: 5
- Type: appropriate Extension Flags and Encryption Flags (see below)
- ReqType:
URQ_CONCLUSION
- ID: Socket ID of the Caller
- Cookie: the cookie previously received in the induction phase
The Destination Socket ID (in the SRT header, PH_ID
field) in this message is the
socket ID that was previously received in the induction phase in the ID
field
in the handshake structure.
The Type field contains:
- Encryption Flags: advertised
PBKEYLEN
(see above) - Extension Flags: The
HS_EXT_
prefixed flags defined inCHandShake
- see the SRT Extended Handshake section below.
The Listener responds with the same values shown above, without the cookie (which isn't needed here), as well as the extensions for HSv5 (which will probably be exactly the same).
IMPORTANT: There isn't any "negotiation" here. If the values passed in the
handshake are in any way not acceptable by the other side, the connection will
be rejected. The only case when the Listener can have precedence over the Caller
is the advertised PBKEYLEN
in the Encryption Flags
field in Type
field.
The value for latency is always agreed to be the greater of those reported
by each party.
When two parties attempt to connect in Rendezvous mode, they are considered to be equivalent: Both are connecting, but neither is listening, and they expect to be contacted (over the same port number for both parties) specifically by the same party with which they are trying to connect. Therefore, it's perfectly safe to assume that, at some point, each party will have agreed upon the connection, and that no induction-conclusion phase split is required. Even so, the Rendezvous handshake process is more complicated.
The basics of a Rendezvous handshake are the same in HSv4 and HSv5 - the description of the HSv4 process is a good introduction for HSv5. However, HSv5 has more data to exchange and more conditions to be taken into account.
Initially, each party sends an SRT control message of type UMSG_HANDSHAKE
to
the other, with the following fields:
- Version: 4 (HSv4 only)
- Type:
UDT_DGRAM
(HSv4 only) - ReqType:
URQ_WAVEAHAND
- ID: Socket ID of the party sending this message
- Cookie: 0
When the srt_connect()
function is first called by an application, each party
sends this message to its peer, and then tries to read a packet from its
underlying UDP socket to see if the other party is alive. Upon reception of an
UMSG_HANDSHAKE
message, each party initiates the second (conclusion) phase by
sending this message:
- Version: 4
- Type:
UDT_DGRAM
- ReqType:
URQ_CONCLUSION
- ID: Socket ID of the party sending this message
- Cookie: 0
At this point, they are considered to be connected. When either party receives
this message from its peer again, it sends another message with the ReqType
field set as URQ_AGREEMENT
. This is a formal conclusion to the handshake
process, required to inform the peer that it can stop sending conclusion
messages (note that this is UDP, so neither party can assume that the message
has reached its peer).
With HSv4 there's no debate about who is the Initiator and who is the Responder
because this transaction is unidirectional, so the party that has set the
SRTO_SENDER
flag is the Initiator and the other is Responder (as is usual
with HSv4).
The HSv5 Rendezvous process introduces a state machine, and therefore is slightly
different from HSv4, although it is still based on the same message request types.
Both parties start with URQ_WAVEAHAND
and use a Version
value of 5. The
version recognition is easy - the HSv4 client does not look at the Version
value,
whereas HSv5 clients can quickly recognize the version from the Version
field.
The parties only continue with the HSv5 Rendezvous process when Version
= 5
for both. Otherwise the process continues exclusively according to HSv4 rules.
With HSv5 Rendezvous, both parties create a cookie for a process called a "cookie contest". This is necessary for the assignment of Initiator and Responder roles. Each party generates a cookie value (a 32-bit number) based on the host, port, and current time with 1 minute accuracy. This value is scrambled using an MD5 sum calculation. The cookie values are then compared with one another.
Since you can't have two sockets on the same machine bound to the same device and port and operating independently, it's virtually impossible that the parties will generate identical cookies. However, this situation may occur if an application tries to "connect to itself" - that is, either connects to a local IP address, when the socket is bound to INADDR_ANY, or to the same IP address to which the socket was bound. If the cookies are identical (for any reason), the connection will not be made until new, unique cookies are generated (after a delay of up to one minute). In the case of an application "connecting to itself", the cookies will always be identical, and so the connection will never be made.
// Here m_ConnReq.m_iCookie is a local cookie value sent in connection request to the peer.
// m_ConnRes.m_iCookie is a cookie value sent by the peer in its connection request.
const int64_t contest = int64_t(m_ConnReq.m_iCookie) - int64_t(m_ConnRes.m_iCookie);
if ((contest & 0xFFFFFFFF) == 0)
{
return HSD_DRAW;
}
if (contest & 0x80000000)
{
return HSD_RESPONDER;
}
return HSD_INITIATOR;
When one party's cookie value is greater than its peer's (based on 32-bit subtraction of both with potential overflow), it wins the cookie contest and becomes Initiator (the other party becomes the Responder).
At this point there are two "handshake flows" possible (at least theoretically): serial and parallel.
In the serial handshake flow, one party is always first, and the other follows.
That is, while both parties are repeatedly sending URQ_WAVEAHAND
messages, at
some point one party - let's say Alice - will find she has received a
URQ_WAVEAHAND
message before she can send her next one, so she sends a
URQ_CONCLUSION
message in response. Meantime, Bob (Alice's peer) has missed
her URQ_WAVEAHAND
messages, and so Alice's URQ_CONCLUSION
is the first message
Bob has received from her.
This process can be described easily as a series of exchanges between the first and following parties (Alice and Bob, respectively):
- Initially, both parties are in the waving state. Alice sends a handshake
message to Bob:
- Version: 5
- Type: Extension field: 0, Encryption field: advertised
PBKEYLEN
. - ReqType:
URQ_WAVEAHAND
- ID: Alice's socket ID
- Cookie: Created based on host/port and current time
Keep in mind that while Alice doesn't yet know if she is sending this message to
an HSv4 or HSv5 peer, the values from these fields would not be interpreted by
an HSv4 peer when the ReqType is URQ_WAVEAHAND
.
- Bob receives Alice's
URQ_WAVEAHAND
message, switches to the attention state. Since Bob now knows Alice's cookie, he performs a "cookie contest" (compares both cookie values). If Bob's cookie is greater than Alice's, he will become the Initiator. Otherwise, he will become the Responder.
IMPORTANT: The resolution of the Handshake Role (Initiator or Responder) is essential to further processing.
Then Bob responds:
- Version: 5
- Type:
- Extension field: appropriate flags if Initiator, otherwise 0
- Encryption field: advertised
PBKEYLEN
- ReqType:
URQ_CONCLUSION
NOTE: If Bob is the Initiator and encryption is on, he will use either his
own PBKEYLEN
or the one received from Alice (if she has advertised
PBKEYLEN
).
-
Alice receives Bob's
URQ_CONCLUSION
message. While at this point she also performs the "cookie contest", the outcome will be the same. She switches to the fine state, and sends:- Version: 5
- Type: Appropriate extension flags and encryption flags
- ReqType:
URQ_CONCLUSION
NOTE: Both parties always send extension flags at this point, which will
contain SRT_CMD_HSREQ
if the message comes from an Initiator, or
SRT_CMD_HSRSP
if it comes from a Responder. If the Initiator has received a
previous message from the Responder containing an advertised PBKEYLEN
in the
encryption flags field (in the Type
field), it will be used as the key length
for key generation sent next in the SRT_CMD_KMREQ
block.
-
Bob receives Alice's
URQ_CONCLUSION
message, and then does one of the following (depending on Bob's role):-
If Bob is the Initiator (Alice's message contains
SRT_CMD_HSRSP
), he:- switches to the connected state
- sends Alice a message with
ReqType
=URQ_AGREEMENT
, but containing no SRT extensions (Extension flags inType
should be 0)
-
If Bob is the Responder (Alice's message contains
SRT_CMD_HSREQ
), he:- switches to initiated state
- sends Alice a message with ReqType =
URQ_CONCLUSION
that also contains extensions withSRT_CMD_HSRSP
- awaits a confirmation from Alice that she is also connected (preferably
by
URQ_AGREEMENT
message)
-
-
Alice receives the above message, enters into the connected state, and then does one of the following (depending on Alice's role):
-
If Alice is the Initiator (received
URQ_CONCLUSION
withSRT_CMD_HSRSP
), she sends Bob a message withReqType
=URQ_AGREEMENT
. -
If Alice is the Responder, the received message has
ReqType
=URQ_AGREEMENT
and in response she does nothing.
-
-
At this point, if Bob was Initiator, he is connected already. If he was a Responder, he should receive the above
URQ_AGREEMENT
message, after which he switches to the connected state. In the case where the UDP packet with the agreement message gets lost, Bob will still enter the connected state once he receives anything else from Alice. If Bob is going to send, however, he has to continue sending the sameURQ_CONCLUSION
until he gets the confirmation from Alice.
The serial handshake flow described above happens in almost every case.
There is, however, a very rare (but still possible) parallel flow that only
occurs if the messages with URQ_WAVEAHAND
are sent and received by both peers
at precisely the same time. This might happen in one of these situations:
- if both Alice and Bob start sending
URQ_WAVEAHAND
messages perfectly simultaneously,
or - if Bob starts later but sends his
URQ_WAVEAHAND
message during the gap between the moment when Alice had earlier sent her message, and the moment when that message is received (that is, if each party receives the message from its peer immediately after having sent its own),
or - if, at the beginning of
srt_connect
, Alice receives the first message from Bob exactly during the very short gap between the time Alice is adding a socket to the connector list and when she sends her firstURQ_WAVEAHAND
message
The resulting flow is very much like Bob's behaviour in the serial handshake flow, but for both parties. Alice and Bob will go through the same state transitions:
Waving -> Attention -> Initiated -> Connected
In the Attention state they know each other's cookies, so they can assign
roles. It is important to understand that, in contrast to serial flows,
which are mostly based on request-response cycles, here everything
happens completely asynchronously: the state switches upon reception
of a particular handshake message with appropriate contents (the
Initiator must attach the HSREQ
extension, and Responder must attach the
HSRSP
extension).
Here's how the parallel handshake flow works, based on roles:
Initiator:
Waving
- Receives
URQ_WAVEAHAND
message - Switches to
Attention
- Sends
URQ_CONCLUSION
+HSREQ
- Receives
Attention
- Receives
URQ_CONCLUSION
message, which:- contains no extensions:
- switches to
Initiated
, still sendsURQ_CONCLUSION
+HSREQ
- switches to
- contains
HSRSP
extension:- switches to
Connected
, sendsURQ_AGREEMENT
- switches to
- contains no extensions:
- Receives
Initiated
- Receives
URQ_CONCLUSION
message, which:- Contains no extensions:
- REMAINS IN THIS STATE, still sends
URQ_CONCLUSION
+HSREQ
- REMAINS IN THIS STATE, still sends
- contains
HSRSP
extension:- switches to
Connected
, sendsURQ_AGREEMENT
- switches to
- Contains no extensions:
- Receives
Connected
- May receive
URQ_CONCLUSION
and respond withURQ_AGREEMENT
, but normally by now it should already have received payload packets.
- May receive
Responder:
Waving
- Receives
URQ_WAVEAHAND
message - Switches to
Attention
- Sends
URQ_CONCLUSION
message (with no extensions)
- Receives
Attention
- Receives
URQ_CONCLUSION
message withHSREQ
NOTE: This message might contain no extensions, in which case the party shall simply send the emptyURQ_CONCLUSION
message, as before, and remain in this state. - Switches to
Initiated
and sendsURQ_CONCLUSION
message withHSRSP
- Receives
Initiated
- Receives:
URQ_CONCLUSION
message withHSREQ
- responds with
URQ_CONCLUSION
withHSRSP
and remains in this state
- responds with
URQ_AGREEMENT
message- responds with
URQ_AGREEMENT
and switches toConnected
- responds with
- Payload packet
- responds with
URQ_AGREEMENT
and switches toConnected
- responds with
- Receives:
Connected
- Is not expecting to receive any handshake messages anymore. The
URQ_AGREEMENT
message is always sent only once or per every finalURQ_CONCLUSION
message.
- Is not expecting to receive any handshake messages anymore. The
Note that any of these packets may be missing, and the sending party will never become aware. The missing packet problem is resolved this way:
-
If the Responder misses the
URQ_CONCLUSION
+HSREQ
message, it simply continues sending emptyURQ_CONCLUSION
messages. Only upon reception ofURQ_CONCLUSION
+HSREQ
does it respond withURQ_CONCLUSION
+HSRSP
. -
If the Initiator misses the
URQ_CONCLUSION
+HSRSP
response from the Responder, it continues sendingURQ_CONCLUSION
+HSREQ
. The Responder must always respond withURQ_CONCLUSION
+HSRSP
when the Initiator sendsURQ_CONCLUSION
+HSREQ
, even if it has already received and interpreted it. -
When the Initiator switches to the
Connected
state it responds with aURQ_AGREEMENT
message, which may be missed by the Responder. Nonetheless, the Initiator may start sending data packets because it considers itself connected
- it doesn't know that the Responder has not yet switched to the
Connected
state. Therefore it is exceptionally allowed that when the Responder is in theInitiated
state and receives a data packet (or any control packet that is normally sent only between connected parties) over this connection, it may switch to theConnected
state just as if it had received aURQ_AGREEMENT
message.
- If the the Initiator is already switched to the
Connected
state it will not bother the Responder with any more handshake messages. But the Responder may be completely unaware of that (having missed theURQ_AGREEMENT
message from the Initiator). Therefore it doesn't exit the connecting state (still blocks onsrt_connect
or doesn't signal connection readiness), which means that it continues sendingURQ_CONCLUSION
+HSRSP
messages until it receives any packet that will make it switch to theConnected
state (normallyURQ_AGREEMENT
). Only then does it exit the connecting state and the application can start transmission.
When one of the parties in a handshake supports HSv5 and the other only HSv4, the handshake is conducted according to the rules described in the HSv4 Rendezvous Process section above.
Note, though, that in the first phase the URQ_WAVEAHAND
request type sent
by the HSv5 party contains the m_iVersion
and m_iType
fields filled in as
required for version 5. This happens only for the "waving" phase, and fortunately
HSv4 clients ignore these fields. When switching to the conclusion phase, the
HSv5 client is already aware that the peer is HSv4 and fills the fields of the
conclusion handshake message according to the rules of HSv4.
The HSv4 extended handshake process starts after the connection is considered established. Whatever problems may occur after this point will only affect data transmission.
Here is a representation of the HSv4 extended handshake packet structure (including the first four 32-bit segments of the SRT header):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| Type=0x7fff | Ext {HSREQ(1),HSRSP(2)} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Additional Info = undefined |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Stamp (µsec) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SRT Version {<10300h} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SRT Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TsbPd Resv = 0 | TsbPdDelay {20..8000} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The HSv4 extended handshake is performed with the use of the aforementioned
"SRT Extended Messages", using control messages with major type UMSG_EXT
.
Note that these command messages, although sent over an established connection, are still simply UDP packets. As such they are subject to all the problematic UDP protocol phenomena, such as packet loss (packet recovery applies exclusively to the payload packets). Therefore messages are sent "stubbornly" (with a slight delay between subsequent retries) until the peer responds, with some maximum number of retries before giving up. It's very important to understand that the first message from an Initiator is sent at the same moment when the application requests transmission of the first data packet. This data packet is not held back until the extended SRT handshake is finished. The first command message is sent, followed by the first data packet, and the rest of the transmission continues without having the extended SRT handshake yet agreed upon.
This means that the initial few data packets might be sent without having the appropriate SRT settings already working, which may raise two concerns:
-
There is a delay in the application of latency to received packets - At first, packets are being delivered immediately. It is only when the
SRT_CMD_HSREQ
message is processed that latency is applied to the received packets. The time stamp based packet delivery mechanism (TSBPD) isn't working until then. -
There is a delay in the application of encryption (if used) to received packets - Packets can't be decrypted until the
SRT_CMD_KMREQ
is processed and the keys installed. The data packets are still encrypted, but the receiver can't decrypt them and will drop them.
The codes for commands used are the same in HSv4 and HSv5 processes. In
HSv4 these are minor message type codes used with the UMSG_EXT
command,
whereas in HSv5 they are in the "command" part of the extension block. The
messages that are sent as "REQ" parts will be repeatedly sent until they get
a corresponding "RSP" part, up to some timeout, after which they give up and
stay with a pure UDT connection.
Here is a representation of the HSv5 integrated handshake packet structure (without SRT header):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
| UDT Version {5} | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Encryption Flags | Extension Flags | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Initial Packet Sequence Number | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Maximum Packet Size | H
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A
| Maximum Flow Window Size | N
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D
| Connection Type | S
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H
| Socket ID | A
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ K
| SYN Cookie | E
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Peer IP Address | |
| | |
| | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
| Ext Type=SRT_CMD_HSREQ(1) | Ext Size {3} | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ H
| SRT Version {>=10300h} | S
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ R
| SRT Flags | E
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Q
| RcvTsbPdDelay {20..8000} | SndTsbPdDelay {20..8000} | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
| Ext Type=SRT_CMD_KMREQ(3) | Ext Size (bytes/4) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|0| V{1} PT{2}| Sign {2029h} | Resv {0} |KK| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| KEKI {0} | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Cipher {2} | Auth {0} | SE {2} | Resv1 {0} | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Recv2 {0} | Slen(bytes)/4 | klen(bytes)/4 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Salt[Slen] | |
| | |
| | K
| | M
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ R
| Wrap[((KK+1/2)*Klen) + 8] | E
| | Q
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
The Extension Flags subfield in the Type
field in a conclusion handshake
message contains one of these flags:
HS_EXT_HSREQ
: defines SRT characteristic data; always presentHS_EXT_KMREQ
: if using encryption, defines encryption blockHS_EXT_CONFIG
: informs about having extra configuration data attached
The above schema shows the HSv5 packet structure, which can be split into three parts:
- The Handshake data part (up to "Peer IP Address" field)
- The HSREQ extension
- The KMREQ extension
Note that extensions are added only in certain situations (as described
above), so sometimes there are no extensions at all. When extensions are added,
the HSREQ extension is always present. The KMREQ extension is added only if
encryption is requested (the passphrase is set by the SRTO_PASSPHRASE
socket
option). There might be also other extensions placed after HSREQ and KMREQ.
Every extension block has the following structure:
(1) a 16-bit command symbol
(2) 16-bit block size (number of 32-bit words following this field)
(3) a number of 32-bit fields, as specified in (2) above
What is contained in a block depends on the extension command code.
The data being received in the extension blocks in the conclusion message undergo further verification. If the values are not acceptable, the connection will be rejected. This may happen in the following situations:
-
The
Version
field contains 0. This means that the peer rejected the handshake. -
The
Version
field was higher than 4, but no extensions were added (no extension flags set), while the rules state that they should be present. This is considered an error in the case of aURQ_CONCLUSION
message sent by the Initiator to the Responder (there can be an initial conclusion message without extensions sent by the Responder to the Initiator in Rendezvous connections). -
Processing of any of the extension data has failed (also due to an internal error).
-
Each side declares a transmission type that is not compatible with the other. This will be described further, along with other new HSv5 features; the HSv4 client supports only and exclusively one transmission type, which is Live. This is indicated in the
Type
field in the HSv4 handshake, which must be equal toUDT_DGRAM
(2), and in the HSv5 by the extra Smoother block declaration (see below). In any case, when there's no Smoother declared, Live is assumed. Otherwise the Smoother type must be exactly the same on both sides.
NOTE: The TsbPd Resv
and TsbPdDelay
fields both refer to latency,
but the use is different in HSv4 and HSv5.
In HSv4, only the lower 16 bits (TsbPdDelay
) are used. The upper 16 bits
(TsbPd Resv
) are simply unused. There's only one direction, so HSREQ
is
sent by the Sender, HSRSP
by the Receiver. HSREQ
contains only the Sender
latency, and HSRSP
contains only the Receiver latency.
This is different from HSv5, in which the latency value for the sending
direction in the lower 16 bits (SndTsbPdDelay
, 16 - 31 in network order) and
for receiving direction is placed in the upper 16 bits (RcvTsbpdDelay
, 0 -
15). The communication is bidirectional, so there are two latency values, one
per direction. Therefore both HSREQ and HSREQ messages contain both the Sender
and Receiver latency values.
The SRT_CMD_HSREQ
message contains three 32-bit fields designated as:
SRT_HS_VERSION
: string (0x00XXYYZZ) representing SRT version XX.YY.ZZSRT_HS_FLAGS
: the SRT flags (see below)SRT_HS_LATENCY
: the latency specification
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SRT Version {>=10300h} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SRT Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|(HSv4) TsbPd Resv = 0 | TsbPdDelay {20..8000} |
|(HSv5) RcvTsbPdDelay {20..8000}| SndTsbPdDelay {20..8000} |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The flags (SRT Flags
field) are the following bits, in order:
(0) SRT_OPT_TSBPDSND
: The party will be sending in TSBPD (Time Stamp Based
Packet Delivery) mode.
This is used by the Sender party to specify that it will use TSBPD mode. The Responder should respond with its setting for TSBPD reception; if it isn't using TSBPD for reception, it responds with its reception TSBPD flag not set. In HSv4, this is only used by the Initiator.
(1) SRT_OPT_TSBPDRCV
: The party expects to receive in TSBPD mode.
This is used by a party to specify that it expects to receive in TSBPD mode. The Responder should respond to this setting with TSBPD sending mode (HSv5 only) and set the sending TSBPD flag appropriately. In HSv4 this is only used by the Responder party.
(2) SRT_OPT_HAICRYPT
: The party includes haicrypt
(legacy flag).
This special legacy compatibility flag should be always set. See below for more details.
(3) SRT_OPT_TLPKTDROP
: The party will do TLPKTDROP.
Declares the SRTO_TLPKTDROP
flag of the party. This is important
because both parties must cooperate in this process. In HSv5, if both
directions are TSBPD, both use this setting. While it is not always
necessary to set this flag in live mode, it is the default and most
recommended setting.
(4) SRT_OPT_NAKREPORT
: The party will do periodic NAK reporting.
Declares the SRTO_NAKREPORT
flag of the party. This flag means
that periodic NAK reports will be sent (repeated UMSG_LOSSREPORT
message when the sender seems to linger with retransmission).
(5) SRT_OPT_REXMITFLG
: The party uses the REXMIT flag.
This special legacy compatibility flag should be always set. See below for more details.
(6) SRT_OPT_STREAM
: The party uses stream type transmission.
This is introduced in HSv5 only. When set, the party is using a stream type transmission (file transmission with no boundaries). In HSv4 this flag does not exist, and therefore it's always clear, which corresponds to the fact that HSv4 supports Live mode only.
Special Legacy Compatibility Flags
The SRT_OPT_HAICRYPT
and SRT_OPT_REXMITFLG
fields define special cases for
the interpretation of the contents in the SRT header for payload packets.
The SRT header contains an unusual field designated as PH_MSGNO
,
which contains first some extra flags that occupy the most significant
bits in this field (the rest are assigned to the Message Number).
Some of these extra flags were already in UDT, but SRT added some
more by stealing bits from the Message Number subfield:
-
Encryption Key flags (2 bits). Controlled by
SRT_OPT_HAICRYPT
, this field contains a value that declares whether the payload is encrypted and with which key. -
Retransmission flag (1 bit). Controlled by
SRT_OPT_REXMITFLG
, this flag is 0 when a packet is sent the first time, and 1 when it is retransmitted (i.e. requested in a loss report). When the incoming packet is late (one with a sequence number older than the newest received so far), this flag allows the Receiver to distinguish between a retransmitted packet and a reordered packet. This is used by the "reorder tolerance" feature described in the API documentation underSRTO_LOSSMAXTTL
socket option.
As of version 1.2.0 both these fields are in use, and therefore both these flags must always be set. In theory, there might still exist some SRT versions older than 1.2.0 where these flags are not used, and these extra bits remain part of the "Message Number" subfield.
In practice there are no versions around that do not use encryption bits, although there might be some old SRT versions still in use that do not include the Retransmission field, which was introduced in version 1.2.0. In practice both these flags must be set in the version that has them defined. They might be reused in future for something else, once all versions below 1.2.0 are decommissioned, but the default is for them to be set.
The SRT_HS_LATENCY
field defines Sender/Receiver latency.
It is split into two 16-bit parts. The usage differs in HSv4 and HSv5.
In HSv4 only the lower part (bits 16 - 31) is used. The upper part (bits 0 - 15) is always 0. The interpretation of this field is as follows:
- Receiver party: Receiver latency
- Sender party: Sender latency
In HSv5 both 16-bit parts of the field are used, and interpreted as follows::
- Upper 16 bits (0 - 15): Receiver latency
- Lower 16 bits (16 - 31): Sender latency
The characteristics of Sender and Receiver latency are the following:
-
Sender latency is the minimum latency that the Sender wants the Receiver to use.
-
Receiver latency is the (minimum) value that the Receiver wishes to apply to the stream that it will be receiving.
Once these values are exchanged via the extended handshake, an effective latency is established, which is always the maximum of the two. Note that latency is defined in a specified direction. In HSv5, a connection is bidirectional, and a separate latency is defined for each direction.
The Initiator sends an HSREQ
message, which declares the values on its side.
The Responder calculates the maximum values between what it receives in the
HSREQ
and its own values, then sends an HSRSP
with the effective latencies.
Here is an example of an HSv5 bidirectional transmission between Alice and Bob, where Alice is Initiator:
-
Alice and Bob set the following latency values:
- Alice:
SRTO_PEERLATENCY
= 250 ms,SRTO_RCVLATENCY
= 550 ms - Bob:
SRTO_PEERLATENCY
= 500 ms,SRTO_RCVLATENCY
= 300 ms
- Alice:
-
Alice defines the latency field in the HSREQ message:
hs[SRT_HS_LATENCY] = { 250, 550 }; // { Lower, Upper }
- Bob receives it, sets his options, and responds with
HSRSP
:
SRTO_RCVLATENCY = max(300, 250); //<-- 250:Alice's PEERLATENCY
SRTO_PEERLATENCY = max(500, 550); //<-- 550:Alice's RCVLATENCY
hs[SRT_HS_LATENCY] = { 550, 300 };
- Alice receives this
HSRSP
and sets:
SRTO_RCVLATENCY = 550;
SRTO_PEERLATENCY = 300;
We now have the effective latency values:
- For transmissions from Alice to Bob: 300ms
- For transmissions from Bob to Alice: 550ms
Here is an example of an HSv4 exchange, which is simpler because there's only one direction. We'll refer to Alice to Bob again to be consistent with the Initiator/Responder roles in the HSv5 example:
-
Alice sets
SRTO_LATENCY
to 250 ms -
Bob sets
SRTO_LATENCY
to 300 ms -
Alice sends
hs[SRT_HS_LATENCY] = { 250, 0 };
to Bob -
Bob does
SRTO_LATENCY = max(300, 250);
-
Bob sends
hs[SRT_HS_LATENCY] = {300, 0};
to Alice -
Alice sets
SRTO_LATENCY
to 300
Note that the SRTO_LATENCY
option in HSv5 sets both SRTO_RCVLATENCY
and
SRTO_PEERLATENCY
to the same value, although when reading, SRTO_LATENCY
is an alias to SRTO_RCVLATENCY
.
Why is the Sender latency updated to the effective latency for that direction?
Because the TLPKTDROP
mechanism, which is used by default in Live mode, may
cause the Sender to decide to stop retransmitting packets that are known to be
too late to retransmit. This latency value is one of the factors taken into
account to calculate the time threshold for TLPKTDROP
.
KMREQ
and KMRSP
contain the KMX (key material exchange) message used for
encryption. The most important part of this message is the
AES-wrapped key (see SRT Encryption for
details). If the encryption process on the Responder side was successful,
the response contains the same message for confirmation. Otherwise it's
one single 32-bit value that contains the value of SRT_KMSTATE
type,
as an error status.
Note that when the encryption settings are different at each end, then the connection is still allowed, but with the following restrictions:
-
If the Initiator declares encryption, but the Responder does not, then the Responder responds with
SRT_KM_S_NOSECRET
status. This means that the Responder will not be able to decrypt data sent by the Initiator, but the Responder can still send unencrypted data to the Initiator. -
If the Initiator did not declare encryption, but the Responder did, then the Responder will attach
SRT_CMD_KMRSP
(despite the fact that the Initiator did not sendSRT_CMD_KMREQ
) withSRT_KM_S_UNSECURED
status. The Responder won't be able to send data to the Initiator (more precisely, it will send scrambled data, not able to be decrypted), but the Initiator will still be able to send unencrypted data to the Responder. -
If both have declared encryption, but have set different passwords, the Responder will send a
KMRSP
block with anSRT_KM_S_BADSECRET
value. The transmission in both directions will be "scrambled" (encrypted and not decryptable).
The value of the encryption status can be retrieved from the
SRTO_SNDKMSTATE
and SRTO_RCVKMSTATE
options. The legacy (or
unidirectional) option SRTO_KMSTATE
resolves to SRTO_RCVKMSTATE
by default, unless the SRTO_SENDER
option is set to true, in which
case it resolves to SRTO_SNDKMSTATE
.
The values retrieved from these options depend on the result of the KMX process:
-
If only one party declares encryption, the KM state will be one of the following:
-
For the party that declares no encryption:
RCVKMSTATE: NOSECRET
SNDKMSTATE: UNSECURED
- Result: This party can send payloads unencrypted, but it can't decrypt packets received from its peer.
-
For the party that declares encryption:
RCVKMSTATE: UNSECURED
SNDKMSTATE: NOSECRET
- Result: This party can receive unencrypted payloads from its peer, and will be able to send encrypted payloads to the peer, but the peer won't decrypt them.
-
-
If both declare encryption, but they have different passwords, then both states are
SRT_KM_S_BADSECRET
. In such a situation both sides may send payloads, but the other party won't decrypt them. -
If both declare encryption and the password is the same on both sides, then both states are
SRT_KM_S_SECURED
. The transmission will be correctly performed with encryption in both directions.
Note that due to the introduction of the bidirectional feature in HSv5 (and therefore the Initiator and Responder roles), the old HSv4 method of initializing the crypto objects used for security is used only in one of the directions. This is now called "forward KMX":
- The Initiator initializes its Sender Crypto (TXC) with preconfigured values. The SEK and SALT values are random-generated.
- The Initiator sends a KMX message to the Receiver.
- The Receiver deploys the KMX message into its Receiver Crypto (RXC)
This is the general process of Security Association done for the "forward direction", that is, when done by the Sender. However, as there's only one KMX process in the handshake, in HSv5 this must also initialize the crypto in the opposite direction. This is accomplished by "reverse KMX":
- The Initiator initializes its Sender Crypto (TXC), like above, and then clones it to the Receiver Crypto.
- The Initiator sends a KMX message to the Responder.
- The Responder deploys the KMX message into its Receiver Crypto (RXC)
- The Responder initializes its Sender Crypto by cloning the Receiver Crypto, that is, by extracting the SEK and SALT from the Receiver Crypto and using them to initialize the Sender Crypto (clone the keys).
This way the Sender (being a Responder) has the Sender Crypto initialized in a manner very similar to that of the Initiator. The only difference is that the SEK and SALT parameters in the crypto:
- are random-generated on the Initiator side
- are extracted (on the Responder side) from the Receiver Crypto, which was configured by the incoming KMX message
The extra operations defined as "reverse KMX" happen exclusively in the HSv5 handshake.
The encryption key (SEK) is normally configured to be refreshed after a predefined number of packets has been sent. To ensure the "soft handoff" to the new key, this process consists of three activities performed in order:
- Pre-announcing of the key (SEK is sent by Sender to Receiver)
- Switching the key (at some point packets are encrypted with the new key)
- Decommissioning the key (removing the old, unused key)
Pre-announcing is done using an SRT Extended Message with the SRT_CMD_KMREQ
extended type, where only the "forward KMX" part is done. When the transmission
is bidirectional, the key refreshing process happens completely independently
for each direction, and it's always initiated by the sending side, independently
of Initiator and Responder roles (actually, these roles are significant only up
to the moment when the connection is considered established).
The decision as to when exactly to perform particular activities belonging to
the key refreshing process is made when the number of sent packets exceeds
a certain value (up to the moment of the connection or previous refresh), which
is controlled by the SRTO_KMREFRESHRATE
and SRTO_KMPREANNOUNCE
options:
- Pre-announce: when # of sent packets >
SRTO_KMREFRESHRATE - SRTO_KMPREANNOUNCE
- Key switch: when # of sent packets >
SRTO_KMREFRESHRATE
- Decommission: when # of sent packets >
SRTO_KMREFRESHRATE + SRTO_KMPREANNOUNCE
In other words, SRTO_KMREFRESHRATE
is the exact number of transmitted packets
for which a key switch happens. The Pre-announce happens SRTO_KMPREANNOUNCE
packets earlier, and Decommission happens SRTO_KMPREANNOUNCE
packets later.
The SRTO_KMPREANNOUNCE
value serves as an intermediate delay to make sure
that from the moment of switching the keys the new key is deployed on the
Receiver, and that the old key is not decommissioned until the last
packet encrypted with that key is received.
The following activities occur when keys are refreshed:
-
Pre-announce: The new key is generated and sent to the Receiver using the SRT Extended Message
SRT_CMD_KMREQ
. The received key is deployed into the Receiver Crypto. The Receiver sends back the same message throughSRT_CMD_KMRSP
as a confirmation that the refresh was successful (if it wasn't, the message contains an error code). -
Key Switch: The Encryption Flags in the
PH_MSGNO
field get toggled betweenEK_EVEN
andEK_ODD
. From this moment on, the opposite (newly generated) key is used. -
Decommission: The old key (the key that was used with the previous flag state) is decommissioned on both the Sender and Receiver sides. The place for the key remains open for future key refreshing.
NOTE The handlers for KMREQ
and KMRSP
are the same for handling the
request coming through an SRT Extended Message and through the handshake
extension blocks, except that in case of the SRT Extended Message only one
direction (forward KMX) is updated. HSv4 relies only on these messages, so
there's no difference between initial and refreshed KM exchange. In HSv5 the
initial KM exchange is done within the handshake in both directions, and then
the key refresh process is started by the Sender and it updates the key for one
direction only.
This is a feature supported by HSv5 only. This adds functionality that has
existed in UDT as "Congestion control class", but implemented with SRT
workflows and requirements in mind. In SRT, the congestion control
mechanism must be set the same on both sides and is identified by
a character string. The extension type is set to SRT_CMD_CONGESTION
.
The extension block contains the length of the content in 4-byte words.
The content is encoded as a string extended to full 4-byte chunks with
padding NUL characters if needed, and then inverted on each 4-byte mark.
For example, a "STREAM" string would be extended to STREAM@@
and then
inverted into ERTS@@MA
(where @
marks the NUL character).
The value is a string with the name of the SRT Congestion Controller type. The default one is called "live". The SRT 1.3.0 version contains an additional optional Congestion Controller type called "file". Within the "file" Congestion Controller it is possible to designate a stream mode and a message mode (the "live" one may only use the message mode, with one message per packet).
This extension is optional and when not present the "live" Congestion Controller is assumed. For an HSv4 party, which doesn't support this feature, it is always the case.
The "file" type reintroduces the old UDT features for stream transmission
(together with the SRT_OPT_STREAM
flag) and messages that can span multiple UDP
packets. The Congestion Controller controls the way the transmission is
handled, how various transmission settings are applied, and how to handle any
special phenomena that happen during transmission.
The "file" Congestion Controller is based completely on the original CUDTCC
class from UDT, and the rules for congestion control are completely copied from
there. However, it contains many changes and allows the selection of the
original UDT code in places that have been modified in SRT to support live
transmission.
This feature is supported by HSv5 only. Its value is a string of
the user's choice that can be passed from the Caller to the Listener. The
symbol for this extension is SRT_CMD_SID
.
The extension block for this extension is encoded the same way as described for Congestion Controler above.
The Stream ID is a string of up to 512 characters that a Caller can pass to a
Listener (it's actually passed from an Initiator to a Responder in general, but
in Rendezvous mode this feature doesn't make sense). To use this feature, an
application should set it on a Caller socket using the SRTO_STREAMID
option.
Upon connection, the accepted socket on the Listener side will have exactly the
same value set, and it can be retrieved using the same option. For more details
about the prospective use of this option, please refer to the
SRT API Socket Options and SRT Access Control (Stream ID) Guidlines.