From 727f8b1c5c7faccae792c6401a2dbe32540cec26 Mon Sep 17 00:00:00 2001 From: vyzo Date: Wed, 29 May 2019 16:31:26 +0300 Subject: [PATCH 01/28] initial DCUtR draft --- relay/DCUtR.md | 104 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 relay/DCUtR.md diff --git a/relay/DCUtR.md b/relay/DCUtR.md new file mode 100644 index 000000000..cb1bdff2b --- /dev/null +++ b/relay/DCUtR.md @@ -0,0 +1,104 @@ +# Direct Connection Upgrade through Relay + +| Lifecycle Stage | Maturity | Status | Latest Revision | +|-----------------|---------------|--------|--------------------| +| 1A | Working Draft | Active | DRAFT, 2019-05-29 | + +Authors: vyzo + +Interest Group: raulk, stebalien, whyrusleeping + +## Introduction + +NAT traversal is a quintessential problem in peer-to-peer networks. + +We currently utilize relays, which allow us to traverse NATs by using +a third party as proxy. Relays are a reliable fallback, that can +connect peers behind NAT albeit with a high-latency, low-bandwidth +connection. Unfortunately, they are expensive to scale and maintain +if they have to carry all the NATed node traffic in the network. + +It is often possible for two peers behind NAT to communicate directly +by utilizing a technique called _hole punching_[1]. The technique +relies on the two peers synchronizing and simultaneously opening +connections to each other to their predicted external address. It +works well for UDP, with an estimated 80% success rate, and reasonably +well for TCP, with an estimated 60% success rate. + +The problem in hole punching, apart from not working all the time, is +the need for rendezvous and synchronization. This is usually +accomplished using dedicated signaling servers [2]. However, this +introduces yet another piece of infrastructure, while still requiring +the use of relays as a fallback for the cases where a direct +connection is not possible. + +In this draft, we describe a synchronization protocol for direct +connectivity with hole punching that eschews signaling servers and +utilizes existing relay connections instead. That is, peers start +with a relay connection and synchronize directly, without the use of a +signaling server. If the hole punching attempt is successful, the +peers _upgrade_ their connection to a direct connection and they can +close the relay connection. If the hole punching attempt fails, they +can keep using the relay connection as they were. + +## The Protocol + +Consider two peers, `A` and `B`. `A` wants to connect to `B`, which is +behind a NAT and advertises relay addresses. `A` may itself be behind +a NAT or be a public node. + +The protocol starts with the completion of a relay connection from `A` +to `B`. Upon observing the new connection, the inbound peer (here `B`) +checks the addresses advertised by `A` via identify. If that set +includes public addresses, then `A` _may_ be reachable by a direct +connection, in which case `B` attempts a unilateral connection upgrade +by initiating a direct connection to `A`. + +If the unilateral connection upgrade attempt fails or if `A` is itself a NATed peer that +doesn't advertise public address, then `B` initiates the direct connection +upgrade protocol as follows: +1. `B` opens a stream to `A` using the `/libp2p/connect` protocol +2. `B` sends to `A` a `Connect` message containing its observed (and possibly predicted) + addresses from identify and starts a timer to measure RTT of the relay connection. +3. Upon receving the `Connect`, `A` responds back with a `Connect` message containing + its observed (and possibly predicted) addresses. +4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for + half the RTT measured from the time between sending the initial `Connect` and receiving + the response. +5. Simultaneous Connect + - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using the addresses + obtained from the `Connect` message. + - Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained + from the `Connect` message. +6. If the connection is successful, then it is prioritized over the relay connection, which + can now be closed, possibly after a grace period. + +The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize +so that they perform a simultaneous open that allows hole punching to succeed. + +### Protobuf + +TBD + +## Implementation Considerations + +There are some difficulties regarding implementing the protocol, at least in `go-libp2p`: +- the swarm currently has no mechanism for direct dials in the presence of existing connections, + as required by the upgrade protocol. +- the swarm has no logic for prioritizing direct connections over relay connections +- the current multistream select protocol is an interactive protocol that requires a single + initiator, which breaks with simultaneous connect as it can result in both peers having outbound + connections to each other. + +All of these will have to be addressed in order to implement the protocol. The first two +are perhaps simple implementation details, but the multistream problem is hard to resolve. +Perhaps we will have to upgrade to `multistream-select/2.0`, which has explicit mechanisms +for handling simultaneous connect, before we can deploy the protocol. + + +## References + +1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. Srisuresh. + https://pdos.csail.mit.edu/papers/p2pnat.pdf +2. Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols. IETF RFC 5245. + https://tools.ietf.org/html/rfc5245 From 9db77f09a560088f79962440b8e3414e62536a11 Mon Sep 17 00:00:00 2001 From: vyzo Date: Wed, 29 May 2019 21:08:03 +0300 Subject: [PATCH 02/28] add paragraph about stream migration --- relay/DCUtR.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index cb1bdff2b..e56fbd3c2 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -70,12 +70,21 @@ upgrade protocol as follows: obtained from the `Connect` message. - Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained from the `Connect` message. -6. If the connection is successful, then it is prioritized over the relay connection, which - can now be closed, possibly after a grace period. The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize so that they perform a simultaneous open that allows hole punching to succeed. +If the direct connection is successful, then the peers should migrate +to it by prioritizing over the existing relay connection. All new +streams should be opened in the direct connection, while the relay +connection should be closed after a grace period. Existing indefinite +duration streams will have to be recreated in the new connection once +the relay connection is closed. This can be accomplised by observing +network notifications: the new direct connection will emit a new +`Connected` notification, while closing the relay connection will +sever existing streams and emit `Disconnected` notification. + + ### Protobuf TBD From fee2b99f4a702bbd2db913e560efa499cbe85243 Mon Sep 17 00:00:00 2001 From: vyzo Date: Wed, 29 May 2019 21:11:50 +0300 Subject: [PATCH 03/28] add boilerplate --- relay/DCUtR.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index e56fbd3c2..8f0ba8911 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -8,6 +8,14 @@ Authors: vyzo Interest Group: raulk, stebalien, whyrusleeping +[vyzo]: https://github.com/vyzo +[raulk]: https://github.com/raulk +[steablien]: https://github.com/stebalien +[whyrusleeping]: https://github.com/whyrusleeping + +See the [lifecycle document](https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md) +for context about maturity level and spec status. + ## Introduction NAT traversal is a quintessential problem in peer-to-peer networks. From 97e5d61c4fe3304af5fdd8353e138b9df436a8ac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Wed, 29 May 2019 19:15:23 +0100 Subject: [PATCH 04/28] fix formatting. --- relay/DCUtR.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 8f0ba8911..e62be7838 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -4,14 +4,14 @@ |-----------------|---------------|--------|--------------------| | 1A | Working Draft | Active | DRAFT, 2019-05-29 | -Authors: vyzo +Authors: [@vyzo] -Interest Group: raulk, stebalien, whyrusleeping +Interest Group: [@raulk], [@stebalien], [@whyrusleeping] -[vyzo]: https://github.com/vyzo -[raulk]: https://github.com/raulk -[steablien]: https://github.com/stebalien -[whyrusleeping]: https://github.com/whyrusleeping +[@vyzo]: https://github.com/vyzo +[@raulk]: https://github.com/raulk +[@stebalien]: https://github.com/stebalien +[@whyrusleeping]: https://github.com/whyrusleeping See the [lifecycle document](https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md) for context about maturity level and spec status. From 4ccccf5e777a6bea6fa289ff5946afdc0f112468 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 11 Aug 2021 19:03:54 +0200 Subject: [PATCH 05/28] relay/DCUtR: Copy Protocol Buffer schema from Golang impl --- relay/DCUtR.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index e62be7838..5b89aa396 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -95,7 +95,27 @@ sever existing streams and emit `Disconnected` notification. ### Protobuf -TBD +```proto +syntax = "proto2"; + +package holepunch.pb; + +message HolePunch { + enum Type { + CONNECT = 100; + SYNC = 300; + } + + optional Type type=1; + + // For hole punching, we'll send some additional observed addresses to the remote peer + // that could have been filtered by the Host address factory (for example: AutoRelay removes all public addresses if peer has private reachability). + // This is a hack! + // We plan to have a better address discovery and advertisement mechanism in the future. + // See https://github.com/libp2p/go-libp2p-autonat/pull/98 + repeated bytes ObsAddrs = 2; +} +``` ## Implementation Considerations From 4b9549a7d15948931bb222d2900f26fd2a45bdcc Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 11 Aug 2021 19:17:51 +0200 Subject: [PATCH 06/28] relay/DCUtR: Add table of contents --- relay/DCUtR.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 5b89aa396..8158c2c86 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -16,6 +16,15 @@ Interest Group: [@raulk], [@stebalien], [@whyrusleeping] See the [lifecycle document](https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md) for context about maturity level and spec status. +## Table of Contents + +- [Direct Connection Upgrade through Relay](#direct-connection-upgrade-through-relay) + - [Introduction](#introduction) + - [The Protocol](#the-protocol) + - [Protobuf](#protobuf) + - [Implementation Considerations](#implementation-considerations) + - [References](#references) + ## Introduction NAT traversal is a quintessential problem in peer-to-peer networks. From 73064f9aba1db5ee1b521ad9a78bf41d92e3e3eb Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 11 Aug 2021 19:18:52 +0200 Subject: [PATCH 07/28] relay/DCUtR: Add note on protocol id --- relay/DCUtR.md | 1 + 1 file changed, 1 insertion(+) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 8158c2c86..37f937aeb 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -74,6 +74,7 @@ by initiating a direct connection to `A`. If the unilateral connection upgrade attempt fails or if `A` is itself a NATed peer that doesn't advertise public address, then `B` initiates the direct connection upgrade protocol as follows: + 1. `B` opens a stream to `A` using the `/libp2p/connect` protocol 2. `B` sends to `A` a `Connect` message containing its observed (and possibly predicted) addresses from identify and starts a timer to measure RTT of the relay connection. From dfc988c55fb49288e556c614ec334da75a2a3ac4 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 11 Aug 2021 19:22:47 +0200 Subject: [PATCH 08/28] relay/DCUtR: Remove go specific implementation considerations Considerations specific to a single implementation should be tracked on the implementation repository. Also, this removes the outdated multistream-select issue. Outdated since multistream-select simultaneous-open has been merged. --- relay/DCUtR.md | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 37f937aeb..a8ee2262f 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -127,22 +127,6 @@ message HolePunch { } ``` -## Implementation Considerations - -There are some difficulties regarding implementing the protocol, at least in `go-libp2p`: -- the swarm currently has no mechanism for direct dials in the presence of existing connections, - as required by the upgrade protocol. -- the swarm has no logic for prioritizing direct connections over relay connections -- the current multistream select protocol is an interactive protocol that requires a single - initiator, which breaks with simultaneous connect as it can result in both peers having outbound - connections to each other. - -All of these will have to be addressed in order to implement the protocol. The first two -are perhaps simple implementation details, but the multistream problem is hard to resolve. -Perhaps we will have to upgrade to `multistream-select/2.0`, which has explicit mechanisms -for handling simultaneous connect, before we can deploy the protocol. - - ## References 1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. Srisuresh. From 46bd410b67e0040facfa6d16a01d000050e6747e Mon Sep 17 00:00:00 2001 From: Max Inden Date: Wed, 11 Aug 2021 19:34:28 +0200 Subject: [PATCH 09/28] relay/DCUtR: Add TODO for retry logic --- relay/DCUtR.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index a8ee2262f..97fec33f8 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -89,6 +89,8 @@ upgrade protocol as follows: - Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained from the `Connect` message. + + The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize so that they perform a simultaneous open that allows hole punching to succeed. From 4e94481ff2560010fe60c2c3d24146e4605f8ee8 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 13 Aug 2021 21:00:32 +0200 Subject: [PATCH 10/28] relay/DCUtR: Document message length prefixing --- relay/DCUtR.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 97fec33f8..8a9b758d4 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -105,7 +105,13 @@ network notifications: the new direct connection will emit a new sever existing streams and emit `Disconnected` notification. -### Protobuf +### RPC messages + +All RPC messages sent over a stream are prefixed with the message length in +bytes, encoded as an unsigned variable length integer as defined by the +[multiformats unsigned-varint spec][uvarint-spec]. + +RPC messages conform to the following protobuf: ```proto syntax = "proto2"; @@ -135,3 +141,5 @@ message HolePunch { https://pdos.csail.mit.edu/papers/p2pnat.pdf 2. Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols. IETF RFC 5245. https://tools.ietf.org/html/rfc5245 + +[uvarint-spec]: https://github.com/multiformats/unsigned-varint From 9d42524bad1c37f66d1bca50b103c49d72ad47da Mon Sep 17 00:00:00 2001 From: Max Inden Date: Fri, 13 Aug 2021 21:07:02 +0200 Subject: [PATCH 11/28] relay/DCUtR: Document maximum message size --- relay/DCUtR.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 8a9b758d4..e160982b0 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -111,7 +111,10 @@ All RPC messages sent over a stream are prefixed with the message length in bytes, encoded as an unsigned variable length integer as defined by the [multiformats unsigned-varint spec][uvarint-spec]. -RPC messages conform to the following protobuf: +Implemntations SHOULD refuse encoded RPC messages (length prefix excluded) +larger than 4 KiB. + +RPC messages conform to the following protobuf schema: ```proto syntax = "proto2"; From 9958df289480723d29ff01b23f7edb3008de7bdb Mon Sep 17 00:00:00 2001 From: Max Inden Date: Tue, 17 Aug 2021 14:26:55 +0200 Subject: [PATCH 12/28] relay/DCUtR: Wrap at 80 chars --- relay/DCUtR.md | 41 ++++++++++++++++++++++------------------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index e160982b0..9188870be 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -71,28 +71,30 @@ includes public addresses, then `A` _may_ be reachable by a direct connection, in which case `B` attempts a unilateral connection upgrade by initiating a direct connection to `A`. -If the unilateral connection upgrade attempt fails or if `A` is itself a NATed peer that -doesn't advertise public address, then `B` initiates the direct connection -upgrade protocol as follows: +If the unilateral connection upgrade attempt fails or if `A` is itself a NATed +peer that doesn't advertise public address, then `B` initiates the direct +connection upgrade protocol as follows: 1. `B` opens a stream to `A` using the `/libp2p/connect` protocol -2. `B` sends to `A` a `Connect` message containing its observed (and possibly predicted) - addresses from identify and starts a timer to measure RTT of the relay connection. -3. Upon receving the `Connect`, `A` responds back with a `Connect` message containing - its observed (and possibly predicted) addresses. -4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for - half the RTT measured from the time between sending the initial `Connect` and receiving - the response. +2. `B` sends to `A` a `Connect` message containing its observed (and possibly + predicted) addresses from identify and starts a timer to measure RTT of the + relay connection. +3. Upon receving the `Connect`, `A` responds back with a `Connect` message + containing its observed (and possibly predicted) addresses. +4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer + for half the RTT measured from the time between sending the initial `Connect` + and receiving the response. 5. Simultaneous Connect - - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using the addresses - obtained from the `Connect` message. - - Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained - from the `Connect` message. + - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using + the addresses obtained from the `Connect` message. + - Upon expiry of the timer, `B` starts a direct dial to `A` using the + addresses obtained from the `Connect` message. -The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize -so that they perform a simultaneous open that allows hole punching to succeed. +The purpose of the `Sync` message and `B`'s timer is to allow the two peers to +synchronize so that they perform a simultaneous open that allows hole punching +to succeed. If the direct connection is successful, then the peers should migrate to it by prioritizing over the existing relay connection. All new @@ -140,9 +142,10 @@ message HolePunch { ## References -1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. Srisuresh. - https://pdos.csail.mit.edu/papers/p2pnat.pdf -2. Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols. IETF RFC 5245. +1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. + Srisuresh. https://pdos.csail.mit.edu/papers/p2pnat.pdf +2. Interactive Connectivity Establishment (ICE): A Protocol for Network Address + Translator (NAT) Traversal for Offer/Answer Protocols. IETF RFC 5245. https://tools.ietf.org/html/rfc5245 [uvarint-spec]: https://github.com/multiformats/unsigned-varint From 4b7c1ce1e07b6d7803f0cc8b487b33887a23e894 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Tue, 17 Aug 2021 14:28:34 +0200 Subject: [PATCH 13/28] relay/DCUtR: Add mxinden to interest group --- relay/DCUtR.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 9188870be..f89d0de42 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -6,12 +6,13 @@ Authors: [@vyzo] -Interest Group: [@raulk], [@stebalien], [@whyrusleeping] +Interest Group: [@raulk], [@stebalien], [@whyrusleeping], [@mxinden] [@vyzo]: https://github.com/vyzo [@raulk]: https://github.com/raulk [@stebalien]: https://github.com/stebalien [@whyrusleeping]: https://github.com/whyrusleeping +[@mxinden]: https://github.com/mxinden See the [lifecycle document](https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md) for context about maturity level and spec status. From fe64a21ddd9cae8b64f9b6160eb5787da5947bec Mon Sep 17 00:00:00 2001 From: Max Inden Date: Tue, 17 Aug 2021 18:27:11 +0200 Subject: [PATCH 14/28] relay/DCUtR: Document retry logic --- relay/DCUtR.md | 42 ++++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index f89d0de42..076f8987f 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -20,10 +20,11 @@ for context about maturity level and spec status. ## Table of Contents - [Direct Connection Upgrade through Relay](#direct-connection-upgrade-through-relay) + - [Table of Contents](#table-of-contents) - [Introduction](#introduction) - [The Protocol](#the-protocol) - - [Protobuf](#protobuf) - - [Implementation Considerations](#implementation-considerations) + - [RPC messages](#rpc-messages) + - [FAQ](#faq) - [References](#references) ## Introduction @@ -50,14 +51,13 @@ introduces yet another piece of infrastructure, while still requiring the use of relays as a fallback for the cases where a direct connection is not possible. -In this draft, we describe a synchronization protocol for direct -connectivity with hole punching that eschews signaling servers and -utilizes existing relay connections instead. That is, peers start -with a relay connection and synchronize directly, without the use of a -signaling server. If the hole punching attempt is successful, the -peers _upgrade_ their connection to a direct connection and they can -close the relay connection. If the hole punching attempt fails, they -can keep using the relay connection as they were. +In this specification, we describe a synchronization protocol for direct +connectivity with hole punching that eschews signaling servers and utilizes +existing relay connections instead. That is, peers start with a relay connection +and synchronize directly, without the use of a signaling server. If the hole +punching attempt is successful, the peers _upgrade_ their connection to a direct +connection and they can close the relay connection. If the hole punching attempt +fails, they can keep using the relay connection as they were. ## The Protocol @@ -75,8 +75,7 @@ by initiating a direct connection to `A`. If the unilateral connection upgrade attempt fails or if `A` is itself a NATed peer that doesn't advertise public address, then `B` initiates the direct connection upgrade protocol as follows: - -1. `B` opens a stream to `A` using the `/libp2p/connect` protocol +1. `B` opens a stream to `A` using the `/libp2p/dcutr` protocol. 2. `B` sends to `A` a `Connect` message containing its observed (and possibly predicted) addresses from identify and starts a timer to measure RTT of the relay connection. @@ -90,8 +89,9 @@ connection upgrade protocol as follows: the addresses obtained from the `Connect` message. - Upon expiry of the timer, `B` starts a direct dial to `A` using the addresses obtained from the `Connect` message. - - +6. On failure go back to step (2), reusing the same stream opened in (1). + Inbound peers (here `B`) SHOULD retry twice (thus a total of 3 attempts) + before considering the upgrade as failed. The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize so that they perform a simultaneous open that allows hole punching @@ -141,6 +141,20 @@ message HolePunch { } ``` +## FAQ + +- *Why exchange `CONNECT` and `SYNC` messages once more on each retry?* + + Doing an additional CONNECT and SYNC for each retry prevents a flawed RTT + measurement on the first attempt to distort all following retry attempts. + +- *Why reuse the same stream for retries?* + + Stream opening and stream protocol negotiation might distort the measured + round-trip-time. Reusing the stream from the first attempt allows cutting out + these distortions, allowing a more precise round-trip-time measurement on the + second and third attempt. + ## References 1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. From db9475efc34d2db67ef2c289063ec3e98410201e Mon Sep 17 00:00:00 2001 From: Max Inden Date: Tue, 17 Aug 2021 18:29:14 +0200 Subject: [PATCH 15/28] relay/DCUtR: Remove implementation specific event emission --- relay/DCUtR.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 076f8987f..d5a37e819 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -102,10 +102,7 @@ to it by prioritizing over the existing relay connection. All new streams should be opened in the direct connection, while the relay connection should be closed after a grace period. Existing indefinite duration streams will have to be recreated in the new connection once -the relay connection is closed. This can be accomplised by observing -network notifications: the new direct connection will emit a new -`Connected` notification, while closing the relay connection will -sever existing streams and emit `Disconnected` notification. +the relay connection is closed. ### RPC messages From 2d8b38f7b2c6ce02091ed9c0b214472c63283d31 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Tue, 17 Aug 2021 18:32:03 +0200 Subject: [PATCH 16/28] relay/DCUtR: Remove note on obs address sending --- relay/DCUtR.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index d5a37e819..b2ac2a58c 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -129,11 +129,6 @@ message HolePunch { optional Type type=1; - // For hole punching, we'll send some additional observed addresses to the remote peer - // that could have been filtered by the Host address factory (for example: AutoRelay removes all public addresses if peer has private reachability). - // This is a hack! - // We plan to have a better address discovery and advertisement mechanism in the future. - // See https://github.com/libp2p/go-libp2p-autonat/pull/98 repeated bytes ObsAddrs = 2; } ``` From 6530d45e8426009bfdc7c884adc3735081fbcb92 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Tue, 17 Aug 2021 18:38:54 +0200 Subject: [PATCH 17/28] relay/DCUtR: Update date --- relay/DCUtR.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index b2ac2a58c..267804625 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -2,7 +2,7 @@ | Lifecycle Stage | Maturity | Status | Latest Revision | |-----------------|---------------|--------|--------------------| -| 1A | Working Draft | Active | DRAFT, 2019-05-29 | +| 1A | Working Draft | Active | r0, 2021-08-17 | Authors: [@vyzo] From 0076c69f5d0bda956f244bebbd2261f3dc453076 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Sun, 22 Aug 2021 21:55:59 +0200 Subject: [PATCH 18/28] relay/DCUtR: Add Marten to interest group Co-authored-by: Marten Seemann --- relay/DCUtR.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 267804625..bd0830672 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -6,13 +6,14 @@ Authors: [@vyzo] -Interest Group: [@raulk], [@stebalien], [@whyrusleeping], [@mxinden] +Interest Group: [@raulk], [@stebalien], [@whyrusleeping], [@mxinden], [@marten-seemann] [@vyzo]: https://github.com/vyzo [@raulk]: https://github.com/raulk [@stebalien]: https://github.com/stebalien [@whyrusleeping]: https://github.com/whyrusleeping [@mxinden]: https://github.com/mxinden +[@marten-seemann]: https://github.com/marten-seemann See the [lifecycle document](https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md) for context about maturity level and spec status. From b420064884b99b6fe170f5965380b09d38dc2a28 Mon Sep 17 00:00:00 2001 From: Marten Seemann Date: Mon, 23 Aug 2021 11:34:33 +0200 Subject: [PATCH 19/28] relay/DCUtR: Assign roles and describe hole punching on QUIC (#361) --- relay/DCUtR.md | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index bd0830672..49ddae94b 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -85,11 +85,23 @@ connection upgrade protocol as follows: 4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for half the RTT measured from the time between sending the initial `Connect` and receiving the response. -5. Simultaneous Connect - - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using - the addresses obtained from the `Connect` message. - - Upon expiry of the timer, `B` starts a direct dial to `A` using the - addresses obtained from the `Connect` message. +5. Simultaneous Connect. This depends on the transport in use: + - For TCP: + - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using + the addresses obtained from the `Connect` message. + - Upon expiry of the timer, `B` starts a direct dial to `A` using the + addresses obtained from the `Connect` message. + - This will result in a TCP Simultaneous Connect. For the purpose of all + protocols run on top of this TCP connection, `A` is assumed to be the + client and `B` the server. + - For QUIC: + - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using + the addresses obtained from the `Connect` message. + - Upon expiry of the timer, `B` starts to send UDP packets filled with + random bytes to the addresses obtained from the `Connect` message. + Packets should be sent in random intervals between 10 and 200 ms. + - This will result in a QUIC connection where `A` is the client and `B` + is the server. 6. On failure go back to step (2), reusing the same stream opened in (1). Inbound peers (here `B`) SHOULD retry twice (thus a total of 3 attempts) before considering the upgrade as failed. From af0b9bbc13629b8a28cc0843e336af91fe7cd795 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 11:46:58 +0200 Subject: [PATCH 20/28] relay/DCUtR: Stress that one should connect to all addresses --- relay/DCUtR.md | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 49ddae94b..df96114bb 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -87,21 +87,25 @@ connection upgrade protocol as follows: and receiving the response. 5. Simultaneous Connect. This depends on the transport in use: - For TCP: - - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using - the addresses obtained from the `Connect` message. - - Upon expiry of the timer, `B` starts a direct dial to `A` using the - addresses obtained from the `Connect` message. + - Upon receiving the `Sync`, `A` immediately attempts to directly connect + to `B`. `A` does so by dialling the addresses obtained from the + `Connect` message in parallel. + - Upon expiry of the timer, `B` attempts to directly connect to `A`. `B` + does so by dialing the addresses obtained from the `Connect` message in + parallel. - This will result in a TCP Simultaneous Connect. For the purpose of all - protocols run on top of this TCP connection, `A` is assumed to be the - client and `B` the server. + protocols run on top of this TCP connection, `A` is assumed to be the + client and `B` the server. - For QUIC: - - Upon receiving the `Sync`, `A` immediately starts a direct dial to B using - the addresses obtained from the `Connect` message. + - Upon receiving the `Sync`, `A` immediately attempts to directly connect + to `B`. `A` does so by dialing the addresses obtained from the `Connect` + message in parallel. - Upon expiry of the timer, `B` starts to send UDP packets filled with - random bytes to the addresses obtained from the `Connect` message. - Packets should be sent in random intervals between 10 and 200 ms. - - This will result in a QUIC connection where `A` is the client and `B` - is the server. + random bytes to the addresses obtained from the `Connect` message. + Packets should be sent in random intervals between 10 and 200 ms to each + address. + - This will result in a QUIC connection where `A` is the client and `B` is + the server. 6. On failure go back to step (2), reusing the same stream opened in (1). Inbound peers (here `B`) SHOULD retry twice (thus a total of 3 attempts) before considering the upgrade as failed. From 6f475dee2df3456cfe2f156e6b5a9bdde1b81fb3 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 11:48:15 +0200 Subject: [PATCH 21/28] relay/DCUtR: Fix typo --- relay/DCUtR.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index df96114bb..2573aca3d 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -128,7 +128,7 @@ All RPC messages sent over a stream are prefixed with the message length in bytes, encoded as an unsigned variable length integer as defined by the [multiformats unsigned-varint spec][uvarint-spec]. -Implemntations SHOULD refuse encoded RPC messages (length prefix excluded) +Implementations SHOULD refuse encoded RPC messages (length prefix excluded) larger than 4 KiB. RPC messages conform to the following protobuf schema: From 17f627532d9b4ddd54537d41ce38943e25d19bef Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 11:54:46 +0200 Subject: [PATCH 22/28] relay/DCUtR: Mention addressing specification for ObsAddrs field --- relay/DCUtR.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 2573aca3d..4fc61c59b 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -150,6 +150,9 @@ message HolePunch { } ``` +`ObsAddrs` is a list of multiaddrs encoded in the binary multiaddr +representation. See [Addressing specification] for details. + ## FAQ - *Why exchange `CONNECT` and `SYNC` messages once more on each retry?* @@ -173,3 +176,4 @@ message HolePunch { https://tools.ietf.org/html/rfc5245 [uvarint-spec]: https://github.com/multiformats/unsigned-varint +[Addressing specification]: ../addressing/README.md From 5943d3b34a3df937d56c8fa8308a2b5e01922795 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 12:02:30 +0200 Subject: [PATCH 23/28] relay/DCUtR: Do not reuse same stream on retry --- relay/DCUtR.md | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 4fc61c59b..e5f4abe6f 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -106,9 +106,8 @@ connection upgrade protocol as follows: address. - This will result in a QUIC connection where `A` is the client and `B` is the server. -6. On failure go back to step (2), reusing the same stream opened in (1). - Inbound peers (here `B`) SHOULD retry twice (thus a total of 3 attempts) - before considering the upgrade as failed. +6. On failure go back to step (1). Inbound peers (here `B`) SHOULD retry twice + (thus a total of 3 attempts) before considering the upgrade as failed. The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize so that they perform a simultaneous open that allows hole punching @@ -160,13 +159,6 @@ representation. See [Addressing specification] for details. Doing an additional CONNECT and SYNC for each retry prevents a flawed RTT measurement on the first attempt to distort all following retry attempts. -- *Why reuse the same stream for retries?* - - Stream opening and stream protocol negotiation might distort the measured - round-trip-time. Reusing the stream from the first attempt allows cutting out - these distortions, allowing a more precise round-trip-time measurement on the - second and third attempt. - ## References 1. Peer-to-Peer Communication Across Network Address Translators. B. Ford and P. From 6f558f13aa4cbfe8458a3207764bd52e3f61a573 Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 12:36:41 +0200 Subject: [PATCH 24/28] relay/DCUtR: Reword steps for each address --- relay/DCUtR.md | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index e5f4abe6f..a11554252 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -85,25 +85,19 @@ connection upgrade protocol as follows: 4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for half the RTT measured from the time between sending the initial `Connect` and receiving the response. -5. Simultaneous Connect. This depends on the transport in use: - - For TCP: - - Upon receiving the `Sync`, `A` immediately attempts to directly connect - to `B`. `A` does so by dialling the addresses obtained from the - `Connect` message in parallel. - - Upon expiry of the timer, `B` attempts to directly connect to `A`. `B` - does so by dialing the addresses obtained from the `Connect` message in - parallel. +5. Simultaneous Connect. The two nodes follow the steps below for every address + obtained from the `Connect` message in parallel: + - For a TCP address: + - Upon receiving the `Sync`, `A` immediately dials the address to `B`. + - Upon expiry of the timer, `B` dials the address to `A`. - This will result in a TCP Simultaneous Connect. For the purpose of all protocols run on top of this TCP connection, `A` is assumed to be the client and `B` the server. - - For QUIC: - - Upon receiving the `Sync`, `A` immediately attempts to directly connect - to `B`. `A` does so by dialing the addresses obtained from the `Connect` - message in parallel. + - For a QUIC address: + - Upon receiving the `Sync`, `A` immediately dials the address to `B`. - Upon expiry of the timer, `B` starts to send UDP packets filled with - random bytes to the addresses obtained from the `Connect` message. - Packets should be sent in random intervals between 10 and 200 ms to each - address. + random bytes to `A`'s address. Packets should be sent repeatedly in + random intervals between 10 and 200 ms. - This will result in a QUIC connection where `A` is the client and `B` is the server. 6. On failure go back to step (1). Inbound peers (here `B`) SHOULD retry twice From f7b43df4fb516bc6d1c7d56077f6076da13a177d Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 13:12:27 +0200 Subject: [PATCH 25/28] relay/DCUtR: Detail on success case --- relay/DCUtR.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index a11554252..f2135c2be 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -85,8 +85,8 @@ connection upgrade protocol as follows: 4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for half the RTT measured from the time between sending the initial `Connect` and receiving the response. -5. Simultaneous Connect. The two nodes follow the steps below for every address - obtained from the `Connect` message in parallel: +5. Simultaneous Connect. The two nodes follow the steps below in parallel for + every address obtained from the `Connect` message: - For a TCP address: - Upon receiving the `Sync`, `A` immediately dials the address to `B`. - Upon expiry of the timer, `B` dials the address to `A`. @@ -100,8 +100,10 @@ connection upgrade protocol as follows: random intervals between 10 and 200 ms. - This will result in a QUIC connection where `A` is the client and `B` is the server. -6. On failure go back to step (1). Inbound peers (here `B`) SHOULD retry twice - (thus a total of 3 attempts) before considering the upgrade as failed. +6. On successful establishment of a single connection does `A` cancel all + outstanding connection attempts. On failure of all connection attempts go + back to step (1). Inbound peers (here `B`) SHOULD retry twice (thus a total + of 3 attempts) before considering the upgrade as failed. The purpose of the `Sync` message and `B`'s timer is to allow the two peers to synchronize so that they perform a simultaneous open that allows hole punching From 85f567d3ed7ed31cf0858d09e75c3704a0dce08f Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 13:16:27 +0200 Subject: [PATCH 26/28] relay/DCUtR: Inline `Sync` reasoning --- relay/DCUtR.md | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index f2135c2be..34931cd01 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -84,7 +84,9 @@ connection upgrade protocol as follows: containing its observed (and possibly predicted) addresses. 4. Upon receiving the `Connect`, `B` sends a `Sync` message and starts a timer for half the RTT measured from the time between sending the initial `Connect` - and receiving the response. + and receiving the response. The purpose of the `Sync` message and `B`'s timer + is to allow the two peers to synchronize so that they perform a simultaneous + open that allows hole punching to succeed. 5. Simultaneous Connect. The two nodes follow the steps below in parallel for every address obtained from the `Connect` message: - For a TCP address: @@ -101,21 +103,16 @@ connection upgrade protocol as follows: - This will result in a QUIC connection where `A` is the client and `B` is the server. 6. On successful establishment of a single connection does `A` cancel all - outstanding connection attempts. On failure of all connection attempts go - back to step (1). Inbound peers (here `B`) SHOULD retry twice (thus a total - of 3 attempts) before considering the upgrade as failed. - -The purpose of the `Sync` message and `B`'s timer is to allow the two peers to -synchronize so that they perform a simultaneous open that allows hole punching -to succeed. - -If the direct connection is successful, then the peers should migrate -to it by prioritizing over the existing relay connection. All new -streams should be opened in the direct connection, while the relay -connection should be closed after a grace period. Existing indefinite -duration streams will have to be recreated in the new connection once -the relay connection is closed. - + outstanding connection attempts. The peers should migrate to the established + connection by prioritizing over the existing relay connection. All new + streams should be opened in the direct connection, while the relay connection + should be closed after a grace period. Existing undefinite duration streams + will have to be recreated in the new connection once the relay connection is + closed. + + On failure of all connection attempts go back to step (1). Inbound peers + (here `B`) SHOULD retry twice (thus a total of 3 attempts) before considering + the upgrade as failed. ### RPC messages From cab60cced57e04a2b162b1b571187abc7d94bffa Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 13:55:07 +0200 Subject: [PATCH 27/28] relay/DCUtR: Remove concrete success rates --- relay/DCUtR.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index 34931cd01..ad69701e5 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -38,12 +38,11 @@ connect peers behind NAT albeit with a high-latency, low-bandwidth connection. Unfortunately, they are expensive to scale and maintain if they have to carry all the NATed node traffic in the network. -It is often possible for two peers behind NAT to communicate directly -by utilizing a technique called _hole punching_[1]. The technique -relies on the two peers synchronizing and simultaneously opening -connections to each other to their predicted external address. It -works well for UDP, with an estimated 80% success rate, and reasonably -well for TCP, with an estimated 60% success rate. +It is often possible for two peers behind NAT to communicate directly by +utilizing a technique called _hole punching_[1]. The technique relies on the two +peers synchronizing and simultaneously opening connections to each other to +their predicted external address. It works well for UDP, and reasonably well for +TCP. The problem in hole punching, apart from not working all the time, is the need for rendezvous and synchronization. This is usually From 8001cd9fe43719c559e192d4146f044ebcbb1aba Mon Sep 17 00:00:00 2001 From: Max Inden Date: Mon, 23 Aug 2021 15:44:01 +0200 Subject: [PATCH 28/28] relay/DCUtR: Reword Co-authored-by: Marten Seemann --- relay/DCUtR.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/relay/DCUtR.md b/relay/DCUtR.md index ad69701e5..dfa3ffc1b 100644 --- a/relay/DCUtR.md +++ b/relay/DCUtR.md @@ -101,11 +101,11 @@ connection upgrade protocol as follows: random intervals between 10 and 200 ms. - This will result in a QUIC connection where `A` is the client and `B` is the server. -6. On successful establishment of a single connection does `A` cancel all +6. Once a single connection has been established, `A` SHOULD cancel all outstanding connection attempts. The peers should migrate to the established connection by prioritizing over the existing relay connection. All new streams should be opened in the direct connection, while the relay connection - should be closed after a grace period. Existing undefinite duration streams + should be closed after a grace period. Existing long-lived streams will have to be recreated in the new connection once the relay connection is closed.