-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rendezvous protocol #44
Conversation
summoning @whyrusleeping @lgierth @diasdavid @mkg20001 |
Thanks for writing this up!
This is more a global discovery or something like that. Rendezvous means to meet at an agreed upon time at an agreed upon place. websocket-star-randezvous is a rendezvous protocol because peers connect to each other through the server. As far as I can tell, we'll be using p2p-circuit for that here. While this protocol could be used to facilitate rendezvousing, it's more general than that.
That sounds awfully centralized. Can we not just use the DHT (plus pubsub)? Really, this service seems like a generalization of content discovery. If possible we should try to merge the two services. Before going any further into protocol design, we should briefly sketch out some concrete motivating use-cases. Suggestions:
|
Right, this is pretty much a generalization of just discovery.
Ideally, this can be either centralized or decentralized depending on what the specific application wants (thus, decentralized). You could phrase storing something on the DHT as "connect to the DHT and PUT X", I could say that the DHT is awfully centralized because its the only way of doing content routing right now ;) In any case, I like where this is going. Once this proposal is complete, we could reframe the DHTs content routing in terms of this interface: |
rendezvous/README.md
Outdated
|
||
Client peer `A` connects to the rendezvous service `R` and registers for namespace | ||
`my-app` with a `REGISTER` message. It subsequently enters rendezvous with | ||
a `RENDEZVOUS` message and waits for `REGISTER`/`UNREGISTER` announcements from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold on having an explicit Rendezvous thingy. Isnt it technically just the same as polling the Discovery protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a more efficient version of that, as it listens for new peers on top of the initial lookup:
DISCOVER
returns a list of current peers and stopsRENDEZVOUS
returns the current peers and then sends deltas (newREGISTER/UNREGISTER
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's pull -vs- push.
Right, this is pretty much a generalization of just discovery.
Ideally, this can be either centralized or decentralized depending on what the specific application wants (thus, decentralized). You could phrase storing something on the DHT as "connect to the DHT and PUT X", I could say that the DHT is awfully centralized because its the only way of doing content routing right now ;) In any case, I like where this is going. Once this proposal is complete, we could reframe the DHTs content routing in terms of this interface: |
cc @lgierth for review as well |
also @mkg20001 and @diasdavid |
I think I want to add a |
We may want to consider droppping the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not completly sold on the server requirement. I think something like this could be done using a special pubsub for discovery without the need of a server.
But I think it's enough for a ws-star replacement.
rendezvous/README.md
Outdated
|
||
```protobuf | ||
message Message { | ||
enum MessageType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to merge register and unregister requests? Or dropping the message type completly?
The client could simply check the length of the array for all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want unregister so that we can depart from a namespace without having to disconnect.
The message type is needed to disambiguate the protocol.
rendezvous/README.md
Outdated
} | ||
|
||
message Peer { | ||
optional string id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use the smaller non-encoded version of the peer-id instead of the b58 string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that's just the data type -- we use string for unencoded addresses in go.
Revision 2, after discussion with @whyrusleeping and consensus with @dryajov and @mkg20001 for ws-star replacement. |
Summoning @whyrusleeping @Stebalien @mkg20001 @dryajov for a second round of review. |
To summarize the changes in rev2:
|
rendezvous/README.md
Outdated
The rendezvous protocol provides facilities for real-time peer | ||
discovery within application specific namespaces. Peers connect to the | ||
rendezvous point and register their presence in one or more | ||
namespaces. Registrations persist until the peer disconnects or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiring that nodes be connected to the rendezvous in order for their data to be served by it is strange. Maybe don't specify that bit? Maybe values could have a timeout or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe value lifetime deserves its own section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it has upsides for real time discovery -- just keep the node visible while it's still connected.
But you are right, there are use cases where we just want to register with some TTL and not keep a connection open.
I think we should just add an optional TTL in the REGISTER message -- if it's omitted, keep the registration until the node disconnects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an extra paragraph about the lifetime of the registration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A section is better actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; we have an optional TTL in the REGISTER
message now.
rendezvous/README.md
Outdated
Another client `D` can discover peers in `my-app` by sending a `DISCOVER` message; the | ||
rendezvous point responds with the list of current peer reigstrations. | ||
``` | ||
D -> R: DISCOVER{my-app} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add a way of providing a limit of peers and a timestamp of when they connected.
DISCOVER:{[my-app], [LIMIT, SINCE]}
, where:
LIMIT
(integer) - optional, the max peer we're willing to receive (should prevent various attacks as well as allow peers/apps adjusting this depending on env/conditions). If none provided return all peers.SINCE
(timestamp) - optional, return peers connected from this time onward, allows returning fresh peers. If none provided, return all peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of SINCE
, as it is useful indeed. But we will have to correlate this with a rendezvous point timestamp, so we should probably include one in the DiscoverResponse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we should! Great idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
rendezvous/README.md
Outdated
|
||
message Discover { | ||
optional string ns = 1; | ||
optional int limit = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be theLIMIT
I'm proposing in the comment above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error: Could not resolve int
: JS still complains
I thought it won't hurt to create a companion issue - #47. |
rendezvous/README.md
Outdated
} | ||
|
||
message PeerInfo { | ||
optional string id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no crypto-challenge for the id the only way to prove ownership is the SECIO challenge, which is only done for the current peer-id. Therefore this field seems useless to me...
Why not just drop the PeerInfo msg and use repeated addrs on register?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The registration is forwarded in the DiscoverResponse
(modulo ttl) and it would thus be ambiguous without the peer id.
You also are missing the potential use case of rendezvous points sharing registrations (say in federation of daemons, using pubsub, etc).
More generally, having a peer id allows registrations to be passed around and reconstruct full PeerInfo objects from them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also are missing the potential use case of rendezvous points sharing registrations (say in federation of daemons, using pubsub, etc).
Then the data must be somehow authenticated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can authenticate the sharing peer instead of the data.
Authenticating the registration itself would require a signature in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can authenticate the sharing peer instead of the data.
If the data isn't authenticated, wouldn't that allow for replay attacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but you'd get that too if you were authenticating with a nonce.
I don't think that replay attacks are a big concern in this context, as the sharing peers can establish a trust model with each other.
Note that signatures along are not sufficient to prevent replay attacks in a shared setting either, and trying to add a signature timestamp gets complicated quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
Shouldn't namespace naming be restricted somehow? Like only |
Updated to replace timestamps with cookies. |
Done: libp2p/js-libp2p-rendezvous@42dd587 Now I'll start looking into how to build a cluster with that. |
I think we want to remove the "keep registration until disconnect" behaviour when the TTL is missing. How about we use a default TTL when it's omitted? So I propose a 2hr default TTL with an upper bound of 1day. |
I'd go for 1h-24h with default 2h. (I think smaller TTLs do not really make sense) |
👍 on default TTL of a couple hours, and not doing the 'remove on disconnect' thing. Upper bound is interesting, but i'd be tempted to increase it to 48 or 72 hours. We can always drop it back down if that turns out to be too much, but it may end up enabling some usecases we didnt hadnt already thought about |
Agreed on having a longer upper bound, let's do 72hrs. |
Also specify a minimum upper bound of 72hrs and an E_INVALID_TTL status code.
Updated for default of 2hrs and minimum upper bound of 72hrs. |
We could allow the upper limit of TTL to drop if the server has too many registered clients already but I don't know how I feel about it. The I would also be for defining a programming interface so in future we can create a drop-in fully decentralized replacement. |
Definitely, I really see this doc as a generic interface to any service that provides this functionality. Whether centralized or decentralized. Some usecases call for a high performance centralized or federated solution, where other usecases call for a resilient decentralized one. The calling conventions should be the same either way. |
Note on the API topic, I think of something similar we as we did for libp2p consensus protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I work with @whyrusleeping on filecoin and since we'll likely be using this protocol he invited me to add comments. Apologies for swooping in, feel free to take my comments or leave them as I am lacking a lot of context for this protocol and libp2p conventions.
|
||
```protobuf | ||
message Message { | ||
enum MessageType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For messages with enums it is easy to to create messages that accidentally say something they don't intend to. The way I've seen this happen is that if an enum field has a default value that carries some semantics (name of a message, or a status) then the reader can't distinguish between a programmer explicitly setting that value versus not setting it at all. Or at least the API I was using a few years ago didn't distinguish between the two by default. Anyway, in order to avoid accidentally having semantics for enum fields the pattern was to have the default enum value always be something like UNSET or NOT_SET so that it was clear when someone explicitly set the value versus not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly see some value in defensive programming, but I am not sure I like polluting the protocol with something that protects against PEBKAC-style programming errors.
It would also be rather inconsistent as I don't think we do this anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the casual programmer, who is more likely to suffer from such PEBKAC, should never have to touch these protobufs directly.
These should be well abstracted by implementation libraries, and bugs of this type should be quickly weeded out there.
message Register { | ||
optional string ns = 1; | ||
optional PeerInfo peer = 2; | ||
optional int64 ttl = 3; // in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this were named ttlSec it would be unambiguous and the comment would be unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it (it's ugly -- what is ttlSec? It's ttl in seconds. Is there a ttl not in seconds?), but I also don't feel strongly about it -- so if people would prefer ttlSec as the field, we can change it.
optional bytes cookie = 3; | ||
} | ||
|
||
message DiscoverResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might you want a way to signal errors? For example namespace too long or internal server error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
We briefly discussed this with @mkg20001 and it seemed of limited utility for namespace errors, but internal server errors is probably something we want reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a ResponseStatus for reporting errors.
rendezvous/README.md
Outdated
} | ||
|
||
message RegisterResponse { | ||
optional RegisterStatus status = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always found an optional error string to be polite. For example if E_INVALID_NAMESPACE you could say what the max length is or what the problem is otherwise. Would save the caller from having to guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's a nicety we can add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as statusText
.
c2} | ||
``` | ||
|
||
If `D` wants to progressively poll for real time discovery, it can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me the intended use so perhaps this comment is moot, but: might the progressive poller also want information about unregisters? If they don't get that information then it seems like they'll incrementally build a registration set of non-decreasing size and decreasing usefulness because the list the caller keeps will start to include many stale peers who have unregistered over time. Depending on churn rate the number of stale registrations the caller has could potentially dwarf the number of actually registered peers. There are obvious solutions on the caller side but unclear to a reader like me what's intended here. Might be worth a comment at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reporting the unregistrations in a discovery response is problematic for a couple of reasons:
- requires the rendezvous point to keep track of unregisters modulo the poller's state, which is expensive and couples too tightly.
- complicates the processing of the response in the client, as it assumes a state machine that may simply not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that explicit unregisters are expected to be rare events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another points is that the progressive discovery mechanism is not intended to build a consistent replica of the rendezvou's point internal state, but rather discover new peers since the last discovery.
A rendezvous federation protocol (TBD) would certainly forward unregistrations though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that explicit unregisters are expected to be rare events
I think that explicit unregisters might be useful when a node does a clean shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course!
The question is whether it is worth the trouble propagating this in discover queries, or simply let the TTLs expire stale entries in clients.
I really don't like having to track unregistrations in the database though; Unregister should simply signal deletion and no further propagation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could leave it up to the clients to decide when they want to disconnect. If they want to decrease the amount of registrations, they can purge it locally (checking if ttl expired and remove that registration would be one way, bad ping could be another). Seems to simplify by not having the unregister.
new registrations. | ||
|
||
So here we consider a new client `E` registering after the first query, | ||
and a subsequent query that discovers just that peer by including the cookie: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like might not be a good idea to require registries to keep cookies forever (and probably not even across restarts) so the language here should probably indicate that what is returned when you pass the cookie is only likely an increment set; it could be the full set if the cookie is too old, unknown, the peer reset, whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The registries shouldn't keep state for cookies at all, that's why they are client-side cookies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a stale cookie could elicit an arbitrary set in response.
I think we should simply return an error if the cookie has become stale, which is something the registry can easily detect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an E_INVALID_COOKIE
status code that can fail a progressive discovery and signal a restart.
of query responses, so that large numbers of peer registrations can be | ||
managed. | ||
|
||
### Registration Lifetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: there are several comments referencing "namespace ownership" but that concept does not appear to be present in this document. If it's a part of the protocol or caller contract it's probably worth explicitly mentioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace ownership is beyond the scope of the protocol I think -- it quickly gets into ACLs and that's a rathole I don't want to go down :)
I guess one more question for my edification: there seems to be concern about load (limiting peers to 200 or 1,000; 20k causing a huge load). Can you help me understand that? Coming into this whole thing super naively I assume the "load" referred to is bandwidth? Why the concern (20k * 100 bytes say is only 16Mbits)? Also if bandwidth is really a concern I wonder if you might consider having the payload be bytes with an encoding tag, that way you could ship peerids compressed. (The server could pre-compress batches for example). Or compression at a lower layer could be very effective as well. It's also possible I'm widely missing the point of concern :) |
You'd have to multiply this by 20k to get to the full load -- note that it's not bandwidth so much, but rather memory buffers to fill these responses.
They are cryptographic hashes already, can't compress those.
The main concern is a node just connecting and doing a query that requires a 16MB response -- and the memory buffer to fill and then read this response. Note that it's ok to have those 16MB transmitted as the result of several threaded queries, as this does not commit any significant resources for the interaction. |
and some more error codes.
Updates:
|
Go implementation in libp2p/go-libp2p-rendezvous#1 |
Closing this PR and will open a new one from the same branch. GitHub is having a hard time loading this PR (probably due to the amount of comments?) |
New PR: #56 |
Scope: