[WIP] [RFC] Multistream-2.0 #95

Stebalien · 2018-10-09T01:23:40Z

Here's a draft of multistream-2.0 (+ a retrospective that you can skip).

Note: NONE of this is set in stone (or even sand), this is for discussion.

PLEASE JUMP TO: #95 (comment)

Here's a draft of multistream-2.0 (+ a retrospective that you can skip).

Stebalien · 2018-10-09T01:27:55Z

Note: multistream/choose is actually an xor operator and serial-multiplex is actually an and operator. We may want to just call them multistream/xor and multistream/and but I figured that may be even more confusing.

magik6k

Would be nice to get some example negotiation scenarios to make it a bit easier to reason about

Stebalien · 2018-10-09T12:37:58Z

Would be nice to get some example negotiation scenarios to make it a bit easier to reason about

I agree and @vyzo gave the same feedback. Working on it.

1. Split retro into a separate file. 2. Add an example. 3. Rename hello to advertise. I'd like to encourage using many tiny protocols instead of adding more and more features to bloated protocols. That means separate protocol advertisement, hello, etc. protocols. Maybe this is going too far and we should just call this "identify". 4. Rename serial-multiplex to serial-stream and multistream/choose to speculative-stream (and move it out of the multistream protocol family). 5. Use more varints. Really, we can probably go a step further and make serial-multiplex use a varint.

Stebalien · 2018-11-05T23:25:00Z

Updated.

magik6k · 2018-11-06T02:01:57Z

So the first pass looks good, I need to dig into libp2p code to get a bit more context to convince myself, that there are no other ways to make this less complicated

(Also, can exploit the fact that quic is still experimental and just drop multistream v1 compat on quic connections?)

Stebalien · 2018-11-06T16:03:08Z

(Also, can exploit the fact that quic is still experimental and just drop multistream v1 compat on quic connections?)

Yes and I plan on doing exactly that :).

(Also, can exploit the fact that quic is still experimental and just drop multistream v1 compat on quic connections?)

So the first pass looks good, I need to dig into libp2p code to get a bit more context to convince myself, that there are no other ways to make this less complicated

There probably are (although I'd like to keep it modular; as it stands, implementations can drop sub-protocols).

We can probably either remove or simply not implement the speculative-stream protocol. It's a nice to have, especially for future packet-switched networks, but really not necessary for a first-pass.

We need multistream/dynamic and serial-stream for parity with the current multistream and we need multistream/contextual for cheeper streams.

We can probably drop the explicit multistream/use protocol and just say "in multistream, all protocols begin with a multicodec specifying the protocol". That's effectively multistream/use, it just drops the explicit multistream/use multicodec. That is, multistream/dynamic would begin with <multistream/dynamic (multicodec)>... instead of <multistream/use (multicodec)><multistream/dynamic (multicodec)>....

raulk · 2018-11-06T17:46:24Z

Gonna review this soon. Meanwhile, just jotting down a couple of things I'd like to see in our next iteration of multistream (aside from the usual suspects like less chattiness).

1. Upfront negotiation

Nodes should be able to share a list of protocols they support during session establishment. This is useful for applications that knowingly support a small list of protocol-version tuples (e.g. Ethereum). Such a mechanism allows for 1-RTT negotiation (even async, like the below), where the amortised cost per stream negotiation is paid only once, upfront (=> avoid death by a 1000 cuts).

Example flow

Upon connection, both nodes exchange a list of protocols they support in lexicographical order. This list can be plaintext (inefficient), gzipped (better), a bloom filter (non-deterministic), or something else.

Alongside each entry, they signal a version agreement strategy (byte value), e.g. STRICT, SEMVER, ANY_VERSION, etc. Both peers intersect the lists by:

Deleting the protocols they do not support.
Resolving version ambiguity/conflict by applying the version agreement strategy for the protocol (resolution behaviour to be defined, e.g. for SEMVER we fallback to the oldest).

The output is the identical on both sides: an ordered list of agreed protocols, 0-based. Both parties now build a multiplexing table int => protocol.

Henceforth, when opening a stream, instead of sending the full name of the protocol, they send the int.

Dealing with confusion (fallback). If a node arrives at an undefined result (e.g. different libp2p versions with different logic for the same strategy -- although this is bad), they request the multiplexing table from the peer, and accept it as valid. If both parties request the listing, the connection is terminated.

2. Protocol indexing

In cases where upfront negotiation is unfeasible (e.g. too many protos), peers must keep track of the order of proto selection during the lifetime of a session. When opening a stream for a previously selected protocol, they must send the index instead of the full protocol name.

tomaka · 2018-11-06T17:49:43Z

I don't know whether that's actually a problem, but one consequence of remembering the list of protocols that a node supports is that nodes can no longer decide to drop support for protocols at runtime.

raulk · 2018-11-06T17:56:38Z

@tomaka I believe that case is minor enough we can return an error interactively when the peer attempts to select the now-unsupported protocol.

Also: the good thing about upfront negotiation is that we're already storing protos in the peerstore. If we store the int-indexed multiplexing table, when reestablishing a connection with that peer later, we can exchange merkle hashes of the table, and bypass negotiation altogether if they match, i.e.

Yo! Nothing has changed since <hash>?

Yo! Nope, my mux table root hash is still <hash>. We're good.

Stebalien · 2018-11-06T19:30:23Z

I don't know whether that's actually a problem, but one consequence of remembering the list of protocols that a node supports is that nodes can no longer decide to drop support for protocols at runtime.

@tomaka that can be fixed by sending negative protocol announcements (or just responding with "I don't speak that" later).

Basically, you just can't reuse protocol IDs.

In cases where upfront negotiation is unfeasible (e.g. too many protos), peers must keep track of the order of proto selection during the lifetime of a session. When opening a stream for a previously selected protocol, they must send the index instead of the full protocol name.

Completely agree (and we currently do this, luckily).

Upfront negotiation

So, technically, we already do this (identify). However, we don't currently wait for this negotiation to complete. Unfortunately, due to the lack of a serial-stream-like protocol, we have to wait to setup a stream multiplexer before we can run identify. With this proposal, we can piggy-back a protocol announcement along with the end of the crypto handshake.

If we store the int-indexed multiplexing table, when reestablishing a connection with that peer later, we can exchange merkle hashes of the table, and bypass negotiation altogether if they match, i.e.

Really, we can probably just send the entire table. It'll probably be a packet, maybe two. State between reconnects feels like a bug waiting to happen. I mean, we could do the whole hash-dance (mmm, IPLD) however, a few packets is generally cheeper than a round-trip.

The output is the identical on both sides: an ordered list of agreed protocols, 0-based. Both parties now build a multiplexing table int => protocol.

The protocols may not be symmetric. Really, we should just think of them as "endpoints" rather than protocols.

With that in mind, this proposal doesn't try to merge these lists. Instead, each side just sends their own mapping and expects the other side to use it when establishing inbound connections. Really, these mappings are equivalent to dynamic port mappings.

Dealing with confusion (fallback). If a node arrives at an undefined result (e.g. different libp2p versions with different logic for the same strategy -- although this is bad), they request the multiplexing table from the peer, and accept it as valid. If both parties request the listing, the connection is terminated.

What case is this trying to cover?

Alongside each entry, they signal a version agreement strategy (byte value), e.g. STRICT, SEMVER, ANY_VERSION

So, I'm not convinced about protocol versioning. Semver was designed for a centralized world where every versioned thing is maintained by a single party (linearized changes). However, most web protocols don't work that way; they evolve over time as different parties add different extensions/features. This is why browsers now use and recommend feature detection. For unavoidable breaking changes, I'd just change the protocol itself (e.g., dht1, dht2, etc.).

A flexible alternative is to allow an optional "user-data" object to be associated with a protocol advertisement. Now, the user-data could just be a version, but I'd recommend against that.

Really, I'd like to go further and split protocol/service advertisements (not really in-scope for this discussion but I guess all these systems tend to overlap). That is:

Each side advertises protocols to (a) generate the contextual ID mappings and (b) obviate the need for protocol negotiation.
Each side advertises services where each service may be accompanied by an optional service description "object" (opaque data). (I'd love to use IPLD for this but that may be pushing it).

For example, I may speak the relay protocol but may not be willing to relay connections for others. In this case, I'd advertise the relay protocol but not the service (or I could advertise the service but say that I only accept terminal connections in the service description).

vyzo · 2018-11-07T08:11:09Z

Nodes should be able to share a list of protocols they support during session establishment.

We need a mechanism for update as well; it's not just dropping protocols, but also adding -- for instance the daemon adds protocol handlers dynamically. Furthermore, these protocols can be added after the connection has been established, so we need a push/update protocol to update the contextual tables.

vyzo · 2018-11-08T06:41:05Z

multistream-2.0/spec.md

+anything.
+
+1. `multistream/advertise`: Inform the remote end about which protocols we speak
+   and. This should partially replace the current identify protocol.


incomplete sentence. and?

Is this the equivalent of identify/push?

Yes. Fixed.

Is this the equivalent of identify/push?

Yes except:

It's designed to be significantly more efficient.

It only covers protocol advertisement. I'd like to try to avoid monolithic protocols in the future as small protocols like this are easier to mix/match/upgrade.

vyzo · 2018-11-08T06:43:29Z

multistream-2.0/spec.md

+   multistream if that doesn't work.
+2. `speculative-stream`: A speculative stream "multiplexer" where the initiator
+   can speculatively initiate multiple streams and the receiver must select at
+   most one and discard the others.


presumably there is a mechanism to inform the peer of the actual selection.

Yes (described in the protocol description)

vyzo · 2018-11-08T06:46:12Z

multistream-2.0/spec.md

+
+Unspeced (for now). Really, we just need to send a mapping of protocol
+names/codecs to contextual IDs (and may be some service discovery information).
+Basically, identify.


identify does quite a bit more: provides addresses (and updates), observed addrs, keys, etc... so we are not replacing it entirely, this is just the protocol mapping.

Correct, it doesn't.

vyzo · 2018-11-08T06:57:00Z

I am wondering a bit about the dynamic/contextual identifiers. It seems that these only come in play within user streams, which are themselves within the multiplexer. This means that the multiplexer will need to be informed of these protocol assignments itself.

vyzo · 2018-11-08T07:14:45Z

It's not clear to me how the selection works with user protocols.
The user wants to open a stream using either the old dht protocol identifier or the new one, and this happens within the multiplexer.
Can we have an example on how this would work?
It's not clear how we save either bytes or RTTs in user protocol negotiation.

Edit: For an additional example, the gossipsub router tries to open a stream to both gossipsub and floodsub, with priority to gossipsub (ie if the other peer supports this is what we want to use).

Stebalien · 2019-01-15T12:20:19Z

Questions from discussion concerning serial-stream:

Should we just always use TLS? TLS allows sending additional information (e.g., an advertise packet).
Should we use out-of-band information to avoid uncertainty entirely? That is, should we just assume that we know what security/multiplex protocol the peer speaks.

Questions with respect to advertise:

Should we remember our peer's protocols instead of re-advertising each time? Can we just do this as a separate protocol?

vyzo · 2019-01-15T12:24:45Z

Also, we need to work out a principled way to deal with (tcp) simultaneous open.

jhiesey · 2019-01-15T22:25:52Z

multistream-2.0/spec.md

+  local) should be discarded.
+* -1 - Close: Send an EOF and return to multistream.
+*  0 - Rest: Ends the reuse protocol, transitioning to a direct stream.
+* >0 - Data: The header indicates the length of the data.


>0 is interpreted as formatting, not the literal > character

jhiesey

Makes sense, just a few minor points

jhiesey · 2019-01-15T22:28:01Z

multistream-2.0/spec.md

+
+1. `multistream/advertise`: Inform the remote end about which protocols we
+   speak. This should partially replace the current identify protocol.
+2. `multistream/use`: Selects the stream's protocol using a multicodec.


I'm not a fan of these names. How about multistream/use-muilticodec, multistream/use-dynamic, multistream/use-contextual?

Renamed use -> multicodec, dynamic -> string, contextual -> dynamic.

jhiesey · 2019-01-15T22:29:19Z

multistream-2.0/spec.md

+
+Where the header is:
+
+* -2 - Send a reset and return to multistream. All queued data (remote and


So the distinction is that -2 is an abnormal end and -1 is a normal end?

jhiesey · 2019-01-15T22:31:42Z

multistream-2.0/spec.md

+1. The "ls" feature of multistream has been removed. While useful, this really
+   should be a *protocol*. Given the `serial-stream` protocol, this shouldn't be
+   an issue as we can run as many sub-protocols over the same stream as we want.
+2. To reduce RTTs, all protocols are unidirectional.


What does this mean in practice?

* use -> multicodec * dynamic -> string * contextual -> dynamic

Stebalien

To hopefully remove confusion around names, I've renamed (again sorry):

multistream/use -> multistream/multicodec -- Multistream that uses multicodecs.
multistream/dynamic -> multistream/string -- Multistream that uses strings.
multistream/contextual -> multistream/dynamic -- This means we can talk about "dynamic" IDs which make a lot more sense.

Stebalien · 2019-01-17T11:06:42Z

multistream-2.0/spec.md

+
+1. `multistream/advertise`: Inform the remote end about which protocols we
+   speak. This should partially replace the current identify protocol.
+2. `multistream/use`: Selects the stream's protocol using a multicodec.


Renamed use -> multicodec, dynamic -> string, contextual -> dynamic.

Stebalien · 2019-01-17T11:13:05Z

multistream-2.0/spec.md

+1. The "ls" feature of multistream has been removed. While useful, this really
+   should be a *protocol*. Given the `serial-stream` protocol, this shouldn't be
+   an issue as we can run as many sub-protocols over the same stream as we want.
+2. To reduce RTTs, all protocols are unidirectional.


Stebalien · 2019-01-17T11:15:27Z

multistream-2.0/spec.md

+
+Where the header is:
+
+* -2 - Send a reset and return to multistream. All queued data (remote and


Yes. Updated.

jhiesey · 2019-01-17T11:20:42Z

LGTM. @whyrusleeping your thoughts?

TODO: Move this elsewhere. It's not a part of multistream and is only relevant because it came up in the retro.

Stebalien · 2019-01-23T00:38:55Z

multistream-2.0/spec.md

+any sub connections) with the multistream version. This way, we never have to do
+this again.
+
+## TCP Simultaneous Open


@vyzo, @jhiesey

This is my proposal for handling simultaneous open. TL;DR: The connection has two unidirectional streams until they're joined.

@vyzo IIRC, this is slightly different from the protocol we discussed as we always use it, even if we have no reason to believe we performed a simultaneous connect.

LGTM but we need more wire detail on the examples.

So let's have two examples with full protocol detail, one for the common case and one for simultaneous open.

vyzo · 2019-01-23T09:31:34Z

multistream-2.0/spec.md

+### Usage
+
+We treat each new TCP connection as a pair of unidirectional streams and use
+this protocol to bind them together.


I think we need full detail on the wire for this, all the way up to secio negotiation (both cases).

Kubuxu · 2019-02-01T21:36:14Z

multistream-2.0/spec.md

+The protocol is:
+
+```
+<header (signed 16 bit int)>


Endianess of this integer should be defined or endianness of all integers in the document should be defined.

At this time I'm assuming network byte order/big-endian.

Yes. Network order.

Kubuxu · 2019-02-01T21:39:34Z

multistream-2.0/spec.md

+More specifically,
+
+1. The initiator generates a 32 byte random ID (`ID`).
+2. The initiator negotiates the `duplex-stream` protocol and then sends `0<ID>` (`0` is a single 0 byte).


0x00<id> 0x01<id>
Hex numbers make it obvious that it is a single byte.

raulk · 2019-03-02T17:26:10Z

Multistream is foundational in libp2p and I'm keen to iterate on it. Here are further considerations after letting the proposal sink in for a few weeks:

We commonly refer to multistream as a multiplexer. This leads to confusion, as it's really a protocol selector: nothing in multistream 1.0/2.0 enables parallel streams or conversations over the same conduit, per se (without the involvement of a real multiplexer like yamux).
- I wouldn't call serial-stream a multiplexer, rather an interactive selection protocol.
- Not wanting to start a naming war, but to me multiselect 2.0 is a more precise term to capture what's happening here.
It's unclear to me how implementations are expected to decide which protocols to use, when, what the API will look like, and if these mechanisms will be exposed to the user at all. I think we need a normative choreography in the spec to make this actionable.
Often it's necessary to open a stream for a protocol that has already been selected before, during the lifetime of a session (e.g. imagine a stream pool for kad-dht).

We should make this very efficient, without incurring in the cost of an upfront advertisement (multistream-{dynamic,advertise} pair).

Proposed solution: implementations MUST track unique protocol agreements in a session-bound table, assigning each a sequential 2-byte index starting from 0. Via a multistream-index protocol, any party can then open a new stream for a pre-used protocol by sending its 2-byte index.
Suggestion: Peers MAY store tables transmitted through multistream-advertise. Upon a future reconnection, they can bypass the cost of sending the table again by serialising the last known state for the counterparty (we must define a canonical format, protobuf map?), digesting it and sending a multihash. The party can ACK, or NACK by sending the updated protocol table if the older one no longer stands.

This allows us to bypass redundant advertisements. We could get more sophisticated with CRDTs cc @hsanjuan.
Centralising protocol definitions under multicodec looks unscalable. Consequently I expect multistream-codec to be underused in the wild (except for maybe the protocols the libp2p community maintains).

Even if we went this path, it's unclear how protocol versioning would work (we certainly wouldn't assign a multicodec for each version).
We should cover how multi~~stream~~select 2.0 deals with out-of-band service advertisements. Right now we assume pure uncertainty (not knowing which protocols the other party knows). We should define an interface such that discovery drivers can feed known services to multistream.

This is just around the corner, with Ethereum 2.0 developing discovery v5 on libp2p (which includes support for Ethereum Node Records that carry service advertisements), Bluetooth experimentation by @tomaka (i.e. Bluetooth SDP), and mDNS interest.

P.S. 👏👏 multiselect-2.0 👏👏 multiselect-2.0 👏👏 multiselect-2.0 👏👏

bigs · 2019-10-17T23:18:26Z

@Stebalien @raulk Added a proposal in packet-oriented.md for some extensions, modifications for the packet oriented use case. If this isn't the appropriate venue, I'm happy to move!

[WIP] [RFC] Multistream-2.0

1043853

Here's a draft of multistream-2.0 (+ a retrospective that you can skip).

ghost assigned Stebalien Oct 9, 2018

ghost added the in progress label Oct 9, 2018

Stebalien requested review from vyzo, daviddias, bigs, whyrusleeping, a user, raulk and Kubuxu October 9, 2018 01:24

magik6k self-requested a review October 9, 2018 12:03

magik6k reviewed Oct 9, 2018

View reviewed changes

marten-seemann self-requested a review November 6, 2018 02:34

raulk requested a review from tomaka November 6, 2018 17:52

vyzo reviewed Nov 8, 2018

View reviewed changes

Stebalien added 2 commits November 8, 2018 09:18

fix incomplete sentence

dbb3ec2

clarify some things

6a16dc0

multistream: nit

8ea2f40

jhiesey reviewed Jan 15, 2019

View reviewed changes

Stebalien added 4 commits January 17, 2019 10:59

fix formatting

e273037

multistream: improve naming

fcc5ac3

* use -> multicodec * dynamic -> string * contextual -> dynamic

multistream: update unidirectional comment

1ccfa7b

multistream: clarify serial stream reset

3983557

Stebalien commented Jan 17, 2019

View reviewed changes

multistream: add a protocol for handling simultanious open

eeaea23

TODO: Move this elsewhere. It's not a part of multistream and is only relevant because it came up in the retro.

Stebalien commented Jan 23, 2019

View reviewed changes

vyzo reviewed Jan 23, 2019

View reviewed changes

multistream: add some examples to the TCP simultanious open stuff

5e3aa0d

tomaka mentioned this pull request Jan 29, 2019

Add the possibility to assume that multistream-select will succeed libp2p/rust-libp2p#659

Closed

Kubuxu reviewed Feb 1, 2019

View reviewed changes

zah mentioned this pull request Mar 19, 2019

Phase 0 Networking Specifications ethereum/consensus-specs#763

Merged

This was referenced Jun 5, 2019

LibP2P network backends status-im/nimbus-eth2#278

Merged

Simple enhancements to the networking spec discovered after implementing it ethereum/consensus-specs#1158

Closed

raulk mentioned this pull request Jul 17, 2019

Libp2p Standardization Update ethereum/consensus-specs#1281

Closed

This was referenced Jul 26, 2019

[multistream-select] Reduce roundtrips in protocol negotiation. romanb/rust-libp2p#3

Closed

[multistream-select] Reduce roundtrips in protocol negotiation. libp2p/rust-libp2p#1212

Merged

Add proposal for packet oriented extensions

3c61282

This was referenced Oct 25, 2019

Add alternative multiselect proposal #223

Closed

[DISCUSSION] Multiselect & Packet orientation #226

Open

Stebalien closed this Feb 13, 2024


		Where the header is:

		* -2 - Send a reset and return to multistream. All queued data (remote and

[WIP] [RFC] Multistream-2.0 #95

[WIP] [RFC] Multistream-2.0 #95

Conversation

Stebalien commented Oct 9, 2018 • edited Loading

Stebalien commented Oct 9, 2018

magik6k left a comment

Choose a reason for hiding this comment

Stebalien commented Oct 9, 2018

Stebalien commented Nov 5, 2018

magik6k commented Nov 6, 2018

Stebalien commented Nov 6, 2018

raulk commented Nov 6, 2018 • edited Loading

1. Upfront negotiation

Example flow

2. Protocol indexing

tomaka commented Nov 6, 2018

raulk commented Nov 6, 2018 • edited Loading

Stebalien commented Nov 6, 2018

vyzo commented Nov 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented Nov 8, 2018

vyzo commented Nov 8, 2018 • edited Loading

Stebalien commented Jan 15, 2019

vyzo commented Jan 15, 2019 • edited Loading

Choose a reason for hiding this comment

jhiesey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhiesey commented Jan 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk commented Mar 2, 2019 • edited Loading

bigs commented Oct 17, 2019

Stebalien commented Oct 9, 2018 •

edited

Loading

raulk commented Nov 6, 2018 •

edited

Loading

raulk commented Nov 6, 2018 •

edited

Loading

vyzo commented Nov 8, 2018 •

edited

Loading

vyzo commented Jan 15, 2019 •

edited

Loading

raulk commented Mar 2, 2019 •

edited

Loading