-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [RFC] Multistream-2.0 #95
Conversation
Here's a draft of multistream-2.0 (+ a retrospective that you can skip).
Note: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to get some example negotiation scenarios to make it a bit easier to reason about
I agree and @vyzo gave the same feedback. Working on it. |
1. Split retro into a separate file. 2. Add an example. 3. Rename hello to advertise. I'd like to encourage using many tiny protocols instead of adding more and more features to bloated protocols. That means separate protocol advertisement, hello, etc. protocols. Maybe this is going too far and we should just call this "identify". 4. Rename serial-multiplex to serial-stream and multistream/choose to speculative-stream (and move it out of the multistream protocol family). 5. Use more varints. Really, we can probably go a step further and make serial-multiplex use a varint.
Updated. |
So the first pass looks good, I need to dig into libp2p code to get a bit more context to convince myself, that there are no other ways to make this less complicated (Also, can exploit the fact that quic is still experimental and just drop multistream v1 compat on quic connections?) |
Yes and I plan on doing exactly that :).
There probably are (although I'd like to keep it modular; as it stands, implementations can drop sub-protocols). We can probably either remove or simply not implement the speculative-stream protocol. It's a nice to have, especially for future packet-switched networks, but really not necessary for a first-pass. We need multistream/dynamic and serial-stream for parity with the current multistream and we need multistream/contextual for cheeper streams. We can probably drop the explicit multistream/use protocol and just say "in multistream, all protocols begin with a multicodec specifying the protocol". That's effectively multistream/use, it just drops the explicit multistream/use multicodec. That is, |
Gonna review this soon. Meanwhile, just jotting down a couple of things I'd like to see in our next iteration of multistream (aside from the usual suspects like less chattiness). 1. Upfront negotiationNodes should be able to share a list of protocols they support during session establishment. This is useful for applications that knowingly support a small list of protocol-version tuples (e.g. Ethereum). Such a mechanism allows for 1-RTT negotiation (even async, like the below), where the amortised cost per stream negotiation is paid only once, upfront (=> avoid death by a 1000 cuts). Example flowUpon connection, both nodes exchange a list of protocols they support in lexicographical order. This list can be plaintext (inefficient), gzipped (better), a bloom filter (non-deterministic), or something else. Alongside each entry, they signal a version agreement strategy (byte value), e.g.
The output is the identical on both sides: an ordered list of agreed protocols, 0-based. Both parties now build a multiplexing table Henceforth, when opening a stream, instead of sending the full name of the protocol, they send the Dealing with confusion (fallback). If a node arrives at an undefined result (e.g. different libp2p versions with different logic for the same strategy -- although this is bad), they request the multiplexing table from the peer, and accept it as valid. If both parties request the listing, the connection is terminated. 2. Protocol indexingIn cases where upfront negotiation is unfeasible (e.g. too many protos), peers must keep track of the order of proto selection during the lifetime of a session. When opening a stream for a previously selected protocol, they must send the index instead of the full protocol name. |
I don't know whether that's actually a problem, but one consequence of remembering the list of protocols that a node supports is that nodes can no longer decide to drop support for protocols at runtime. |
@tomaka I believe that case is minor enough we can return an error interactively when the peer attempts to select the now-unsupported protocol. Also: the good thing about upfront negotiation is that we're already storing protos in the peerstore. If we store the int-indexed multiplexing table, when reestablishing a connection with that peer later, we can exchange merkle hashes of the table, and bypass negotiation altogether if they match, i.e.
|
@tomaka that can be fixed by sending negative protocol announcements (or just responding with "I don't speak that" later). Basically, you just can't reuse protocol IDs.
Completely agree (and we currently do this, luckily).
So, technically, we already do this (identify). However, we don't currently wait for this negotiation to complete. Unfortunately, due to the lack of a
Really, we can probably just send the entire table. It'll probably be a packet, maybe two. State between reconnects feels like a bug waiting to happen. I mean, we could do the whole hash-dance (mmm, IPLD) however, a few packets is generally cheeper than a round-trip.
The protocols may not be symmetric. Really, we should just think of them as "endpoints" rather than protocols. With that in mind, this proposal doesn't try to merge these lists. Instead, each side just sends their own mapping and expects the other side to use it when establishing inbound connections. Really, these mappings are equivalent to dynamic port mappings.
What case is this trying to cover?
So, I'm not convinced about protocol versioning. Semver was designed for a centralized world where every versioned thing is maintained by a single party (linearized changes). However, most web protocols don't work that way; they evolve over time as different parties add different extensions/features. This is why browsers now use and recommend feature detection. For unavoidable breaking changes, I'd just change the protocol itself (e.g., A flexible alternative is to allow an optional "user-data" object to be associated with a protocol advertisement. Now, the user-data could just be a version, but I'd recommend against that. Really, I'd like to go further and split protocol/service advertisements (not really in-scope for this discussion but I guess all these systems tend to overlap). That is:
For example, I may speak the relay protocol but may not be willing to relay connections for others. In this case, I'd advertise the relay protocol but not the service (or I could advertise the service but say that I only accept terminal connections in the service description). |
We need a mechanism for update as well; it's not just dropping protocols, but also adding -- for instance the daemon adds protocol handlers dynamically. Furthermore, these protocols can be added after the connection has been established, so we need a push/update protocol to update the contextual tables. |
multistream-2.0/spec.md
Outdated
anything. | ||
|
||
1. `multistream/advertise`: Inform the remote end about which protocols we speak | ||
and. This should partially replace the current identify protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incomplete sentence. and?
Is this the equivalent of identify/push
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the equivalent of identify/push?
Yes except:
- It's designed to be significantly more efficient.
- It only covers protocol advertisement. I'd like to try to avoid monolithic protocols in the future as small protocols like this are easier to mix/match/upgrade.
multistream-2.0/spec.md
Outdated
multistream if that doesn't work. | ||
2. `speculative-stream`: A speculative stream "multiplexer" where the initiator | ||
can speculatively initiate multiple streams and the receiver must select at | ||
most one and discard the others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably there is a mechanism to inform the peer of the actual selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (described in the protocol description)
multistream-2.0/spec.md
Outdated
|
||
Unspeced (for now). Really, we just need to send a mapping of protocol | ||
names/codecs to contextual IDs (and may be some service discovery information). | ||
Basically, identify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identify does quite a bit more: provides addresses (and updates), observed addrs, keys, etc... so we are not replacing it entirely, this is just the protocol mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it doesn't.
I am wondering a bit about the dynamic/contextual identifiers. It seems that these only come in play within user streams, which are themselves within the multiplexer. This means that the multiplexer will need to be informed of these protocol assignments itself. |
It's not clear to me how the selection works with user protocols. Edit: For an additional example, the gossipsub router tries to open a stream to both gossipsub and floodsub, with priority to gossipsub (ie if the other peer supports this is what we want to use). |
Questions from discussion concerning serial-stream:
Questions with respect to advertise:
|
Also, we need to work out a principled way to deal with (tcp) simultaneous open. |
multistream-2.0/spec.md
Outdated
local) should be discarded. | ||
* -1 - Close: Send an EOF and return to multistream. | ||
* 0 - Rest: Ends the reuse protocol, transitioning to a direct stream. | ||
* >0 - Data: The header indicates the length of the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>0
is interpreted as formatting, not the literal >
character
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, just a few minor points
multistream-2.0/spec.md
Outdated
|
||
1. `multistream/advertise`: Inform the remote end about which protocols we | ||
speak. This should partially replace the current identify protocol. | ||
2. `multistream/use`: Selects the stream's protocol using a multicodec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of these names. How about multistream/use-muilticodec
, multistream/use-dynamic
, multistream/use-contextual
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed use -> multicodec, dynamic -> string, contextual -> dynamic.
multistream-2.0/spec.md
Outdated
|
||
Where the header is: | ||
|
||
* -2 - Send a reset and return to multistream. All queued data (remote and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the distinction is that -2 is an abnormal end and -1 is a normal end?
multistream-2.0/spec.md
Outdated
1. The "ls" feature of multistream has been removed. While useful, this really | ||
should be a *protocol*. Given the `serial-stream` protocol, this shouldn't be | ||
an issue as we can run as many sub-protocols over the same stream as we want. | ||
2. To reduce RTTs, all protocols are unidirectional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
* use -> multicodec * dynamic -> string * contextual -> dynamic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To hopefully remove confusion around names, I've renamed (again sorry):
multistream/use
->multistream/multicodec
-- Multistream that uses multicodecs.multistream/dynamic
->multistream/string
-- Multistream that uses strings.multistream/contextual
->multistream/dynamic
-- This means we can talk about "dynamic" IDs which make a lot more sense.
multistream-2.0/spec.md
Outdated
|
||
1. `multistream/advertise`: Inform the remote end about which protocols we | ||
speak. This should partially replace the current identify protocol. | ||
2. `multistream/use`: Selects the stream's protocol using a multicodec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed use -> multicodec, dynamic -> string, contextual -> dynamic.
multistream-2.0/spec.md
Outdated
1. The "ls" feature of multistream has been removed. While useful, this really | ||
should be a *protocol*. Given the `serial-stream` protocol, this shouldn't be | ||
an issue as we can run as many sub-protocols over the same stream as we want. | ||
2. To reduce RTTs, all protocols are unidirectional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
multistream-2.0/spec.md
Outdated
|
||
Where the header is: | ||
|
||
* -2 - Send a reset and return to multistream. All queued data (remote and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Updated.
LGTM. @whyrusleeping your thoughts? |
TODO: Move this elsewhere. It's not a part of multistream and is only relevant because it came up in the retro.
any sub connections) with the multistream version. This way, we never have to do | ||
this again. | ||
|
||
## TCP Simultaneous Open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my proposal for handling simultaneous open. TL;DR: The connection has two unidirectional streams until they're joined.
@vyzo IIRC, this is slightly different from the protocol we discussed as we always use it, even if we have no reason to believe we performed a simultaneous connect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but we need more wire detail on the examples.
So let's have two examples with full protocol detail, one for the common case and one for simultaneous open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
### Usage | ||
|
||
We treat each new TCP connection as a pair of unidirectional streams and use | ||
this protocol to bind them together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need full detail on the wire for this, all the way up to secio negotiation (both cases).
The protocol is: | ||
|
||
``` | ||
<header (signed 16 bit int)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endianess of this integer should be defined or endianness of all integers in the document should be defined.
At this time I'm assuming network byte order/big-endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Network order.
More specifically, | ||
|
||
1. The initiator generates a 32 byte random ID (`ID`). | ||
2. The initiator negotiates the `duplex-stream` protocol and then sends `0<ID>` (`0` is a single 0 byte). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x00<id>
0x01<id>
Hex numbers make it obvious that it is a single byte.
Multistream is foundational in libp2p and I'm keen to iterate on it. Here are further considerations after letting the proposal sink in for a few weeks:
P.S. 👏👏 |
@Stebalien @raulk Added a proposal in |
Here's a draft of multistream-2.0 (+ a retrospective that you can skip).
Note: NONE of this is set in stone (or even sand), this is for discussion.
PLEASE JUMP TO: #95 (comment)