rendezvous protocol #44

vyzo · 2018-03-12T09:27:08Z

Scope:

Generalized Peer Discovery
Bootstrap
Real-time applications that require rendezvous
Content routing

vyzo · 2018-03-12T09:28:02Z

summoning @whyrusleeping @lgierth @diasdavid @mkg20001
cc @dryajov

Stebalien · 2018-03-12T18:19:54Z

Thanks for writing this up!

The rendezvous protocol provides facilities for real-time peer discovery within application specific namespaces.

This is more a global discovery or something like that. Rendezvous means to meet at an agreed upon time at an agreed upon place. websocket-star-randezvous is a rendezvous protocol because peers connect to each other through the server. As far as I can tell, we'll be using p2p-circuit for that here. While this protocol could be used to facilitate rendezvousing, it's more general than that.

Peers connect to the rendezvous service and register their presence in one or more namespaces.

That sounds awfully centralized. Can we not just use the DHT (plus pubsub)? Really, this service seems like a generalization of content discovery. If possible we should try to merge the two services.

Before going any further into protocol design, we should briefly sketch out some concrete motivating use-cases. Suggestions:

A chat room.
Content discovery.

whyrusleeping · 2018-03-12T19:15:16Z

Really, this service seems like a generalization of content discovery.

Right, this is pretty much a generalization of just discovery.

That sounds awfully centralized. Can we not just use the DHT (plus pubsub)?

Ideally, this can be either centralized or decentralized depending on what the specific application wants (thus, decentralized). You could phrase storing something on the DHT as "connect to the DHT and PUT X", I could say that the DHT is awfully centralized because its the only way of doing content routing right now ;)

In any case, I like where this is going. Once this proposal is complete, we could reframe the DHTs content routing in terms of this interface: provide -> register(cid).

whyrusleeping · 2018-03-12T19:16:07Z

rendezvous/README.md

+
+Client peer `A` connects to the rendezvous service `R` and registers for namespace
+`my-app` with a `REGISTER` message. It subsequently enters rendezvous with
+a `RENDEZVOUS` message and waits for `REGISTER`/`UNREGISTER` announcements from


I'm not sold on having an explicit Rendezvous thingy. Isnt it technically just the same as polling the Discovery protocol?

it's a more efficient version of that, as it listens for new peers on top of the initial lookup:

DISCOVER returns a list of current peers and stops

RENDEZVOUS returns the current peers and then sends deltas (new REGISTER/UNREGISTER).

So it's pull -vs- push.

whyrusleeping · 2018-03-12T19:33:17Z

Really, this service seems like a generalization of content discovery.

Right, this is pretty much a generalization of just discovery.

That sounds awfully centralized. Can we not just use the DHT (plus pubsub)?

Ideally, this can be either centralized or decentralized depending on what the specific application wants (thus, decentralized). You could phrase storing something on the DHT as "connect to the DHT and PUT X", I could say that the DHT is awfully centralized because its the only way of doing content routing right now ;)

In any case, I like where this is going. Once this proposal is complete, we could reframe the DHTs content routing in terms of this interface: provide -> register(cid).

whyrusleeping · 2018-03-12T23:15:58Z

cc @lgierth for review as well

whyrusleeping · 2018-03-12T23:16:08Z

also @mkg20001 and @diasdavid

vyzo · 2018-03-13T12:16:47Z

I think I want to add a limit field to DISCOVER so that we can get a delimited view for bootstrap.

vyzo · 2018-03-14T06:54:42Z

We may want to consider droppping the RENDEZVOUS broadcast functionality, as it complicates the implementation and has daemon scalability implications -- cf @whyrusleeping's concerns.

mkg20001

Not completly sold on the server requirement. I think something like this could be done using a special pubsub for discovery without the need of a server.
But I think it's enough for a ws-star replacement.

mkg20001 · 2018-03-28T17:50:53Z

rendezvous/README.md

+
+```protobuf
+message Message {
+  enum MessageType {


Wouldn't it make more sense to merge register and unregister requests? Or dropping the message type completly?
The client could simply check the length of the array for all types.

I think we want unregister so that we can depart from a namespace without having to disconnect.
The message type is needed to disambiguate the protocol.

mkg20001 · 2018-03-28T17:51:30Z

rendezvous/README.md

+  }
+
+  message Peer {
+    optional string id = 1;


I think it's better to use the smaller non-encoded version of the peer-id instead of the b58 string.

sure, that's just the data type -- we use string for unencoded addresses in go.

vyzo · 2018-04-11T17:12:03Z

Revision 2, after discussion with @whyrusleeping and consensus with @dryajov and @mkg20001 for ws-star replacement.

vyzo · 2018-04-11T17:12:40Z

Summoning @whyrusleeping @Stebalien @mkg20001 @dryajov for a second round of review.

vyzo · 2018-04-11T17:15:40Z

To summarize the changes in rev2:

push discovery has been dropped, as it is complicated to implement and requires potentially expensive state tracking.
discovery is a proper request-response interaction, with a DiscoverResponse.
namespace is optional in the query, implying discovery of all registered peers.
added motivating paragraph and use cases section.

whyrusleeping · 2018-04-12T06:03:46Z

rendezvous/README.md

+The rendezvous protocol provides facilities for real-time peer
+discovery within application specific namespaces. Peers connect to the
+rendezvous point and register their presence in one or more
+namespaces. Registrations persist until the peer disconnects or


requiring that nodes be connected to the rendezvous in order for their data to be served by it is strange. Maybe don't specify that bit? Maybe values could have a timeout or something similar?

Maybe value lifetime deserves its own section?

Well, it has upsides for real time discovery -- just keep the node visible while it's still connected.
But you are right, there are use cases where we just want to register with some TTL and not keep a connection open.

I think we should just add an optional TTL in the REGISTER message -- if it's omitted, keep the registration until the node disconnects.

I'll add an extra paragraph about the lifetime of the registration.

A section is better actually.

Done; we have an optional TTL in the REGISTER message now.

dryajov · 2018-04-12T17:04:09Z

rendezvous/README.md

+Another client `D` can discover peers in `my-app` by sending a `DISCOVER` message; the
+rendezvous point responds with the list of current peer reigstrations.
+```
+D -> R: DISCOVER{my-app}


I would also add a way of providing a limit of peers and a timestamp of when they connected.

DISCOVER:{[my-app], [LIMIT, SINCE]}, where:

LIMIT (integer) - optional, the max peer we're willing to receive (should prevent various attacks as well as allow peers/apps adjusting this depending on env/conditions). If none provided return all peers.

SINCE (timestamp) - optional, return peers connected from this time onward, allows returning fresh peers. If none provided, return all peers.

I like the idea of SINCE, as it is useful indeed. But we will have to correlate this with a rendezvous point timestamp, so we should probably include one in the DiscoverResponse.

Indeed we should! Great idea.

dryajov · 2018-04-12T17:05:20Z

rendezvous/README.md

+
+  message Discover {
+    optional string ns = 1;
+    optional int limit = 2;


This seems to be theLIMIT I'm proposing in the comment above...

yeah, that's already there.

Error: Could not resolve int: JS still complains

dryajov · 2018-04-12T17:35:58Z

I thought it won't hurt to create a companion issue - #47.

mkg20001 · 2018-04-14T11:56:51Z

rendezvous/README.md

+  }
+
+  message PeerInfo {
+    optional string id = 1;


If there is no crypto-challenge for the id the only way to prove ownership is the SECIO challenge, which is only done for the current peer-id. Therefore this field seems useless to me...
Why not just drop the PeerInfo msg and use repeated addrs on register?

The registration is forwarded in the DiscoverResponse (modulo ttl) and it would thus be ambiguous without the peer id.

You also are missing the potential use case of rendezvous points sharing registrations (say in federation of daemons, using pubsub, etc).
More generally, having a peer id allows registrations to be passed around and reconstruct full PeerInfo objects from them.

You also are missing the potential use case of rendezvous points sharing registrations (say in federation of daemons, using pubsub, etc).

Then the data must be somehow authenticated, right?

You can authenticate the sharing peer instead of the data.
Authenticating the registration itself would require a signature in them.

You can authenticate the sharing peer instead of the data.

If the data isn't authenticated, wouldn't that allow for replay attacks?

Sure, but you'd get that too if you were authenticating with a nonce.

I don't think that replay attacks are a big concern in this context, as the sharing peers can establish a trust model with each other.

Note that signatures along are not sufficient to prevent replay attacks in a shared setting either, and trying to add a signature timestamp gets complicated quick.

mkg20001 · 2018-04-15T09:02:21Z

Shouldn't namespace naming be restricted somehow? Like only [a-zA-z0-9_.-]+ ?
Edit: And maybe also restricted in length?

vyzo · 2018-04-15T14:30:36Z

Updated to replace timestamps with cookies.

mkg20001 · 2018-04-15T14:45:51Z

Updated to replace timestamps with cookies.

Done: libp2p/js-libp2p-rendezvous@42dd587

Now I'll start looking into how to build a cluster with that.

vyzo · 2018-04-19T17:53:43Z

I think we want to remove the "keep registration until disconnect" behaviour when the TTL is missing.
It complicates the daemon and will also make a mess with federation, as we'd have to handle notifications for peer disconnections and have them trigger db/pubsub updates.

How about we use a default TTL when it's omitted?
Also, the daemon should probably have an upper bound for the TTL.

So I propose a 2hr default TTL with an upper bound of 1day.

mkg20001 · 2018-04-19T18:23:49Z

So I propose a 2hr default TTL with an upper bound of 1day.

I'd go for 1h-24h with default 2h. (I think smaller TTLs do not really make sense)

whyrusleeping · 2018-04-20T03:53:25Z

👍 on default TTL of a couple hours, and not doing the 'remove on disconnect' thing.

Upper bound is interesting, but i'd be tempted to increase it to 48 or 72 hours. We can always drop it back down if that turns out to be too much, but it may end up enabling some usecases we didnt hadnt already thought about

vyzo · 2018-04-20T07:26:41Z

Agreed on having a longer upper bound, let's do 72hrs.
I also think we should have an error response for overly long TTLs, instead of silently truncating and blacking out a node that refreshes close to TTL expiration.

Also specify a minimum upper bound of 72hrs and an E_INVALID_TTL status code.

vyzo · 2018-04-20T07:32:45Z

Updated for default of 2hrs and minimum upper bound of 72hrs.
I also added an E_INVALID_TTL status code for rejecting overly long registrations.

Kubuxu · 2018-04-20T08:17:37Z

We could allow the upper limit of TTL to drop if the server has too many registered clients already but I don't know how I feel about it.

The RegisterResponse might also need to return upper TTL. Otherwise, it is a guessing game.

I would also be for defining a programming interface so in future we can create a drop-in fully decentralized replacement.

whyrusleeping · 2018-04-20T08:20:33Z

I would also be for defining a programming interface so in future we can create a drop-in fully decentralized replacement

Definitely, I really see this doc as a generic interface to any service that provides this functionality. Whether centralized or decentralized. Some usecases call for a high performance centralized or federated solution, where other usecases call for a resilient decentralized one. The calling conventions should be the same either way.

Kubuxu · 2018-04-20T10:53:19Z

Note on the API topic, I think of something similar we as we did for libp2p consensus protocol.

phritz

Hi, I work with @whyrusleeping on filecoin and since we'll likely be using this protocol he invited me to add comments. Apologies for swooping in, feel free to take my comments or leave them as I am lacking a lot of context for this protocol and libp2p conventions.

phritz · 2018-04-20T20:50:16Z

rendezvous/README.md

+
+```protobuf
+message Message {
+  enum MessageType {


For messages with enums it is easy to to create messages that accidentally say something they don't intend to. The way I've seen this happen is that if an enum field has a default value that carries some semantics (name of a message, or a status) then the reader can't distinguish between a programmer explicitly setting that value versus not setting it at all. Or at least the API I was using a few years ago didn't distinguish between the two by default. Anyway, in order to avoid accidentally having semantics for enum fields the pattern was to have the default enum value always be something like UNSET or NOT_SET so that it was clear when someone explicitly set the value versus not.

I can certainly see some value in defensive programming, but I am not sure I like polluting the protocol with something that protects against PEBKAC-style programming errors.

It would also be rather inconsistent as I don't think we do this anywhere else.

Also note that the casual programmer, who is more likely to suffer from such PEBKAC, should never have to touch these protobufs directly.
These should be well abstracted by implementation libraries, and bugs of this type should be quickly weeded out there.

phritz · 2018-04-20T20:50:50Z

rendezvous/README.md

+  message Register {
+    optional string ns = 1;
+    optional PeerInfo peer = 2;
+    optional int64 ttl = 3; // in seconds


if this were named ttlSec it would be unambiguous and the comment would be unnecessary

I don't like it (it's ugly -- what is ttlSec? It's ttl in seconds. Is there a ttl not in seconds?), but I also don't feel strongly about it -- so if people would prefer ttlSec as the field, we can change it.

phritz · 2018-04-20T20:52:11Z

rendezvous/README.md

+    optional bytes cookie = 3;
+  }
+
+  message DiscoverResponse {


Might you want a way to signal errors? For example namespace too long or internal server error?

Ok.
We briefly discussed this with @mkg20001 and it seemed of limited utility for namespace errors, but internal server errors is probably something we want reported.

Added a ResponseStatus for reporting errors.

phritz · 2018-04-20T20:52:59Z

rendezvous/README.md

+  }
+
+  message RegisterResponse {
+    optional RegisterStatus status = 1;


I've always found an optional error string to be polite. For example if E_INVALID_NAMESPACE you could say what the max length is or what the problem is otherwise. Would save the caller from having to guess.

Ok, that's a nicety we can add.

Added as statusText.

phritz · 2018-04-20T20:56:41Z

rendezvous/README.md

+         c2}
+```
+
+If `D` wants to progressively poll for real time discovery, it can use


Unclear to me the intended use so perhaps this comment is moot, but: might the progressive poller also want information about unregisters? If they don't get that information then it seems like they'll incrementally build a registration set of non-decreasing size and decreasing usefulness because the list the caller keeps will start to include many stale peers who have unregistered over time. Depending on churn rate the number of stale registrations the caller has could potentially dwarf the number of actually registered peers. There are obvious solutions on the caller side but unclear to a reader like me what's intended here. Might be worth a comment at least.

Reporting the unregistrations in a discovery response is problematic for a couple of reasons:

requires the rendezvous point to keep track of unregisters modulo the poller's state, which is expensive and couples too tightly.

complicates the processing of the response in the client, as it assumes a state machine that may simply not exist.

Also note that explicit unregisters are expected to be rare events.

Another points is that the progressive discovery mechanism is not intended to build a consistent replica of the rendezvou's point internal state, but rather discover new peers since the last discovery.
A rendezvous federation protocol (TBD) would certainly forward unregistrations though.

Also note that explicit unregisters are expected to be rare events

I think that explicit unregisters might be useful when a node does a clean shutdown.

Yes, of course!

The question is whether it is worth the trouble propagating this in discover queries, or simply let the TTLs expire stale entries in clients.
I really don't like having to track unregistrations in the database though; Unregister should simply signal deletion and no further propagation.

We could leave it up to the clients to decide when they want to disconnect. If they want to decrease the amount of registrations, they can purge it locally (checking if ttl expired and remove that registration would be one way, bad ping could be another). Seems to simplify by not having the unregister.

phritz · 2018-04-20T20:59:53Z

rendezvous/README.md

+new registrations.
+
+So here we consider a new client `E` registering after the first query,
+and a subsequent query that discovers just that peer by including the cookie:


Seems like might not be a good idea to require registries to keep cookies forever (and probably not even across restarts) so the language here should probably indicate that what is returned when you pass the cookie is only likely an increment set; it could be the full set if the cookie is too old, unknown, the peer reset, whatever.

The registries shouldn't keep state for cookies at all, that's why they are client-side cookies.

Note that a stale cookie could elicit an arbitrary set in response.
I think we should simply return an error if the cookie has become stale, which is something the registry can easily detect.

Added an E_INVALID_COOKIE status code that can fail a progressive discovery and signal a restart.

phritz · 2018-04-20T21:01:48Z

rendezvous/README.md

+of query responses, so that large numbers of peer registrations can be
+managed.
+
+### Registration Lifetime


Note: there are several comments referencing "namespace ownership" but that concept does not appear to be present in this document. If it's a part of the protocol or caller contract it's probably worth explicitly mentioning.

Namespace ownership is beyond the scope of the protocol I think -- it quickly gets into ACLs and that's a rathole I don't want to go down :)

phritz · 2018-04-20T21:28:20Z

I guess one more question for my edification: there seems to be concern about load (limiting peers to 200 or 1,000; 20k causing a huge load). Can you help me understand that? Coming into this whole thing super naively I assume the "load" referred to is bandwidth? Why the concern (20k * 100 bytes say is only 16Mbits)?

Also if bandwidth is really a concern I wonder if you might consider having the payload be bytes with an encoding tag, that way you could ship peerids compressed. (The server could pre-compress batches for example). Or compression at a lower layer could be very effective as well.

It's also possible I'm widely missing the point of concern :)

vyzo · 2018-04-21T08:21:14Z

Why the concern (20k * 100 bytes say is only 16Mbits)?

You'd have to multiply this by 20k to get to the full load -- note that it's not bandwidth so much, but rather memory buffers to fill these responses.

that way you could ship peerids compressed.

They are cryptographic hashes already, can't compress those.

It's also possible I'm widely missing the point of concern :)

The main concern is a node just connecting and doing a query that requires a 16MB response -- and the memory buffer to fill and then read this response. Note that it's ok to have those 16MB transmitted as the result of several threaded queries, as this does not commit any significant resources for the interaction.

and some more error codes.

vyzo · 2018-04-21T08:45:33Z

Updates:

Generalized response status enum (not just register responses)
Added response status to DiscoverResponse so that we can signal errors in discover queries
Added statusText to convey error messages in responses.
Added E_INVALID_COOKIE and E_INTERNAL_ERROR error status codes.

vyzo · 2018-04-24T15:49:58Z

Go implementation in libp2p/go-libp2p-rendezvous#1

victorb · 2018-05-21T18:21:03Z

Closing this PR and will open a new one from the same branch. GitHub is having a hard time loading this PR (probably due to the amount of comments?)

victorb · 2018-05-21T18:22:40Z

New PR: #56

rendezvous protocol

696ca34

ghost assigned vyzo Mar 12, 2018

ghost added the in progress label Mar 12, 2018

vyzo added 3 commits March 12, 2018 11:33

cosmetics

dcbd0b4

peer discovery for bootstrap

2ba1ad5

consistent naming for namespace field

3330b69

whyrusleeping reviewed Mar 12, 2018

View reviewed changes

add limit to Discover message, better wording around bootstrap

00fa97b

mkg20001 reviewed Mar 28, 2018

View reviewed changes

rev2: drop push discovery, only polling.

b19d493

protobuf fix: specific messages are optional, not repeated

6dbbd4b

whyrusleeping reviewed Apr 12, 2018

View reviewed changes

add section on registration lifetime

219dc40

dryajov reviewed Apr 12, 2018

View reviewed changes

dryajov mentioned this pull request Apr 12, 2018

create interface-service-discovery based on rendezvous proposal #47

Open

add timestamp to discover queries and responses

ed7c068

mkg20001 reviewed Apr 14, 2018

View reviewed changes

use a cookie for progressive discovery and pagination, not a timestamp.

7036720

vyzo added 2 commits April 18, 2018 18:22

add closing backticks

ec7a579

RegisterStatus field should be called status

9d061e1

default TTL of 2hrs instead of persist until disconneciton semantics

81e5078

Also specify a minimum upper bound of 72hrs and an E_INVALID_TTL status code.

phritz reviewed Apr 20, 2018

View reviewed changes

add error reporting to DiscoverResponse; status text for errors.

9bdee4f

and some more error codes.

protobuf cosmetics -- move response status enum up

4059338

vyzo mentioned this pull request Apr 24, 2018

Implement rendezvous protocol spec libp2p/go-libp2p-rendezvous#1

Open

4 tasks

vyzo mentioned this pull request May 2, 2018

Relay Infrastructure Integration ipfs/kubo#4990

Closed

dryajov mentioned this pull request May 3, 2018

peers don't connect automatically through circuit, but they do through websocket-star rendezvous server ipfs/js-ipfs#1309

Closed

victorb closed this May 21, 2018

ghost removed the in progress label May 21, 2018

victorb mentioned this pull request May 21, 2018

RFC: rendezvous #56

Merged

raulk mentioned this pull request Sep 24, 2018

Other peer discovery mechanism ethresearch/sharding-p2p-poc#47

Open

rendezvous protocol #44

rendezvous protocol #44

Conversation

vyzo commented Mar 12, 2018 • edited Loading

vyzo commented Mar 12, 2018 • edited Loading

Stebalien commented Mar 12, 2018

whyrusleeping commented Mar 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whyrusleeping commented Mar 12, 2018

whyrusleeping commented Mar 12, 2018

whyrusleeping commented Mar 12, 2018

vyzo commented Mar 13, 2018 • edited Loading

vyzo commented Mar 14, 2018 • edited Loading

mkg20001 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented Apr 11, 2018

vyzo commented Apr 11, 2018

vyzo commented Apr 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dryajov commented Apr 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkg20001 commented Apr 15, 2018 • edited Loading

vyzo commented Apr 15, 2018

mkg20001 commented Apr 15, 2018

vyzo commented Apr 19, 2018 • edited Loading

mkg20001 commented Apr 19, 2018

whyrusleeping commented Apr 20, 2018

vyzo commented Apr 20, 2018

vyzo commented Apr 20, 2018

Kubuxu commented Apr 20, 2018 • edited Loading

whyrusleeping commented Apr 20, 2018

Kubuxu commented Apr 20, 2018 • edited Loading

phritz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo Apr 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

victorb May 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented Mar 12, 2018 •

edited

Loading

vyzo commented Mar 12, 2018 •

edited

Loading

vyzo commented Mar 13, 2018 •

edited

Loading

vyzo commented Mar 14, 2018 •

edited

Loading

vyzo commented Apr 11, 2018 •

edited

Loading

mkg20001 commented Apr 15, 2018 •

edited

Loading

vyzo commented Apr 19, 2018 •

edited

Loading

Kubuxu commented Apr 20, 2018 •

edited

Loading

Kubuxu commented Apr 20, 2018 •

edited

Loading

vyzo Apr 21, 2018 •

edited

Loading

victorb May 16, 2018 •

edited

Loading

vyzo commented Apr 21, 2018 •

edited

Loading

vyzo commented Apr 21, 2018 •

edited

Loading