Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] Multiselect & Packet orientation #226

Open
bigs opened this issue Oct 29, 2019 · 3 comments
Open

[DISCUSSION] Multiselect & Packet orientation #226

bigs opened this issue Oct 29, 2019 · 3 comments

Comments

@bigs
Copy link
Contributor

bigs commented Oct 29, 2019

Discussion of work under #225. Today—October 28, 2019—@raulk, @yusefnapora, @aschmahmann, @Stebalien, and I (with a few others listening in!) had a lengthy discussion about requirements for multiselect and, more broadly, packet oriented libp2p piggy-backing off of our weekly sync. The recording of the meeting can be found here, though it's quite length (begins around 35min). I wanted to open a forum for discussion so that @marten-seemann would have a chance to chime in, and wanted to host it on GitHub so that we can have a public log of the discussion.

In the discussion, I outlined the major differences between the @marten-seemann's in-progress proposal and @Stebalien's original proposal, #95, the gist of which is that Marten's proposal, in its current instantiation, is quite paired down compared to @Stebalien's. This simplification is attractive for reasons pertaining to implementation and correctness, but have drawbacks as well.

Conversation relatively quickly steered towards the role and presence of multiplexing in multiselect's duties. @raulk initially proposed boostrapping into mplex and using mplex to manage, per the QUIC parlence, "flows" of information on given protocols. In this scheme, mplex "flows" would allow for request-response protocols to have responses associated with their requests. @bigs generally pushed back against the idea of requiring the presence of mplex, asserting that it might make sense bake mplex into the multiselect protocol wholesale, leveraging a reserved "flow", e.g. flow 0 a la QUIC, for control messages and messages on protocols that don't care about associating requests with responses.

@raulk wisely pointed out that this wouldn't reduce any of mplex's overhead in that case, and @bigs then realized it may just make sense to use multiselect to agree on a multiplexer, like mplex (which is purportedly packet friendly), in the standard fashion and then use that single multiselect channel to send multiplexed messages. What's interesting about this, is we could still somewhat "bake in" this multiplexer to multistream, making it aware of negotiated protocol IDs. That way you could have flows in packet oriented libp2p while only paying the frame overhead of a single multiselect frame and single mplex frame.

At this point, @raulk pivoted the conversation to talk more generally about the notion of an "embryonic" stream and an upgrade path. @bigs mentioned that @Stebalien's serial-stream seemed like a good candidate for this sort of upgrading (thoughts @Stebalien?). @raulk highlighted the usefulness of a "capabilities" abstraction, in which peers could express more abstract desires (e.g. multiplexing, public key crypto w/ no perfect forward secrecy) and multiselect could facilitate bootstrapping an appropriate connection. This seems to align with the concept of being able to "upgrade" a channel within multiselect to allow for primitive multiplexing, while letting other channels stay "context free" so to speak.

A brief example of the type of flow I was trying to describe earlier with mplex:

A: multiselect/string /my/protocol/1.0 12 // A refers to my protocol as 12
B: multiselect/dynamic /my/protocol/1.0 33 // B refers to it as 33
// now A may send B messages on /my/protocol/1.0 using the dynamic ID 33, and vice versa using the dynamic ID 12
...
A: multiselect/string /mplex/2.0 13 [mplex frame for protocol 33, with some initial request payload]
B: multiselect/dynamic /mplex/2.0 34
// now B may reply within this mplex channel to associate a response with a particular request
B: multiselect/message 13 [mplex frame containing reply to the request on /my/protocol/1.0

I kind of rushed through that, but there are some key ideas:

  • Both sender and receiver should probably publish a dynamic ID for a given protocol to prevent redundant round trips.
  • Multiselect's packet oriented multiplexing capabilities would be strictly limited to creating one channel per protocol. If you needed additional channels, you'd need to negotiate a packet friendly multiplexer.
  • By making this packet friendly multiplexer friendly with multiselect, it could benefit by referencing negotiated dynamic IDs in its frame headers.

The closing conversation focused on the fact that packet oriented libp2p and stream oriented libp2p likely have different requirements and it may make sense to have unique messages within the multiselect family that only apply in one of those situations. I don't think it will be necessary, but it's useful to think about. Furthremore, @raulk highlighted it may be important to have some varint bitset "flag" space in multiselect messages, which could be used to communicate the presence of additional information based on "upgrades" that may have happened earlier. This was the least fleshed out of what we talked about, but seems like a useful concept to leave room for!

@marten-seemann
Copy link
Contributor

marten-seemann commented Oct 29, 2019

First of all, it would have been really useful to have been on this call. There were so many situations where I wanted to chime in! I'm not sure how to properly structure all my thoughts that had during these 1.5 hours, so I apologize in advance that my comment here is a bit lengthy.

UDP Packet Size

First of all, regarding the 64k UDP packet size that @raulk brought up: IP fragmentation is only defined for IPv4, people realized that it's not a good idea and it was removed in IPv6. Even if it was possible to fragment a message, it's not a good idea: The loss of a single fragment would lead to the loss of the whole message, so the transmission probability sharply decreases with the number of packets.
Chrome actually ran some path MTU discovery experiments and determined that 1350 bytes a packet size that they deem safe to use under (almost) all conditions. This is the QUIC packet size that Chrome uses. I think we'd be well advised to limit our UDP datagram size to a value in the same order of magnitude.

QUIC

I was happy to hear that you guys referred to QUIC quite a lot during the discussion :) It seems like you have a pretty good understanding of the protocol. There were a few points however that were not correct, two of which I'd like to point out here, since they might have an effect on our protocol design:

  1. QUIC doesn't have a control stream. It used to have a separate stream for the TLS handshake messages (to guarantee ordering), but we even changed that design about 1.5 years ago. Other control messages (like flow control updates, connection ID updates etc.) don't require ordering and are sent as QUIC frames.
  2. 0-RTT data is not sent unencrypted, so we don't have to come up with our own encryption scheme here. In fact, QUIC's 0-RTT data provides the same properties as TLS 1.3's 0-RTT data, meaning that it's encrypted with a key only known to the two peers. The main difference to normal 1-RTT data is that in the general case it is re-playable (although you could even prevent replays unless you're running a distributed server architecture).

Congestion Control

One note regarding @raulk's proposal to offer users an option to construct a transport without congestion control: This seems like a potentially dangerous thing to do, for two reasons: Sending too much data on a not-congestion-controlled connection will induce huge amounts of packet loss. How much is too much might vary between kb/s and Gb/s, depending on your network. The acceptable threshold is the (currently available) bandwidth of the connection. If applications care about their data making it through, they would have to keep track of the bandwidth somehow, i.e. we'd just force them to implement a congestion controller themselves. On a more global perspective, a protocol that doesn't account for congestion when sending significant amounts of data might break the internet if widely adopted and deployed on the internet.

For these reasons, we need to be careful which protocols we run on top of a packet-based transport. Note that QUIC datagrams are connection controlled (they share the congestion control context of the QUIC connection), which should be another argument for preferring QUIC over a simple UDP based protocol.

Protocol Selection

Regarding application protocols running on top of our new packet-based transport, there seem to be two use cases that we're now considering to use the new protocol capability for:

  1. A "fire and forget": An endpoint sends a message to a peer (or to N peers), but it doesn't really care if the message makes it through.
  2. A request-response scheme, or even a more complex "conversation".

The classical example for 1. would be live audio / video streaming: It doesn't make sense to retransmit lost segments, since by the time the retransmission arrives, the audio / video stream would have progressed so far that the segment would be discarded anyway. Another example would be multiplayer games, where updates about other player's positions on a map would be sent unreliably. Here it doesn't really matter that all updates are received, but it makes sense to always send the most current positions instead of retransmitting an outdated game state.

I'm not sure if 2. is a good match for a packet transport. If you care about the response to a request, an unreliable transport seems like a bad fit: Every application running on top of such a transport would have to either implement some kind of retransmission logic, or alternatively at the bare minimum a cleanup logic to discard messages that were lost. Note that loss can strike us twice in this scenario, once for the request and another time for the response. Things get even worse for longer conversations.

If you actually care about the response to your request, you will have to keep state to be able to associate the response with the request you sent earlier. You might as well use a regular stream to achieve the same result, and have the transport deal with retransmissions and the stream multiplexers deal with the complexity of demultiplexing concurrent requests. If you're using QUIC, streams are delivered independently from each other (they are not HoL-blocking), and the responder could even reply to a stream in a "fire and forget" manner by resetting the stream right after sending the response (in this case, QUIC would not retransmit the stream data if it was lost).

Actually, we recently had a very similar conversation at the last IETF meeting in Montreal when discussing the DATAGRAM proposal. As a result of this discussion, flow IDs were removed in the current version of the draft. Flow IDs are now moved to the HTTP/3 layer. In general, flow IDs are not meant to be the equivalent of stream IDs, in libp2p-speak a flow id corresponds to a protocol (note that section 2 gives the analogy of flow IDs with UDP ports). The reason for this is that packet transports are most useful for "fire and forget".

@bigs
Copy link
Contributor Author

bigs commented Oct 29, 2019

So much to digest, thanks!

I'm not sure if 2. is a good match for a packet transport.

I’m inclined to agree here. I think QUIC becomes a de-facto best option for these scenarios when available. Adding flow/conversation multiplexing capabilities would drastically increase the complexity for the packet use case and would essentially lead to a worse QUIC.

@raulk
Copy link
Member

raulk commented Oct 30, 2019

@marten-seemann Thanks for watching the recording and contributing your thoughts promptly ;-) Lots of good points and exactly the kind of feedback that's valuable!

I'll reply by fragmenting over comments (cringeworthy pun intended).

I'm not sure if 2. [A request-response scheme, or even a more complex "conversation"] is a good match for a packet transport.

IMO, "Request-response" as an attribute is too unspecific. "Packet transport" is also too unspecific :) We are talking about unreliable, congestion uncontrolled, unordered packet transports here, like UDP. I think this is what you meant @marten-seemann, so I'll assume that.

There are various factors that define whether such transports are suitable for an application or not, where the interaction pattern (req/rep) is only one of them. Examples of L7 interactive protocols running atop UDP include:

  • Kademlia
  • mainline DHT (BitTorrent)
  • DNS, DNS over TLS
  • NTP

These protocols tend to be speculative and do not benefit from a persistent, reliable connection to the peer. Therefore the cost of setting up a TCP connection outweighs the application interaction itself. They are request-reply, and they need a mechanism to correlate requests and responses under concurrent scenarios, e.g. DNS uses a TxID field.

The question is whether we want to provide such correlation facility within libp2p, by mimicking Streams, insofar their functionality of scoping messages to a particular conversation is concerned.

Upon reflection, I do agree that stream/flow management in an unreliable scenario can get complicated. For example, what happens if the fragment that closes a flow is lost? The flow could remain open forever on the other end. I think there are local mechanisms to deal with such scenarios (e.g. setting timeouts for stream closure), but I do acknowledge that it complicates the protocol, and pushing down the correlation to the application layer may be adequate.

Regardless, I really DO want a facility to convey feature flags with extensible fields on the framing protocol, to pave the way to easier evolution of the protocol, should we decide to incorporate such features in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

3 participants