-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Implement persistence service #75
base: master
Are you sure you want to change the base?
Conversation
@pgte and I had a conversation about strategies for implementing naming across multiple peers that perform writes (multi-write) using IPNS which is single-writer. 1. All peers write to IPNSIn this scheme there are two kinds of IPNS keys:
When a peer's state changes it
When collaboration membership changes, the system
When a new peer comes online and is not able to connect to any other peer, it
Pros:
Cons:
2. A single leader peer writes to IPNSIn this scheme a single leader peer writes to IPNS with the shared collaboration IPNS key. Each time membership changes
When a new peer comes online and is not able to connect to any other peer, it
Pros:
Cons:
Behaviour under network partition:In both schemes the partition behaviour is essentially the same. The difference is that in multi-writer there are multiple peers in each partition writing to IPNS, whereas in single-writer there is one leader peer in each partition writing to IPNS. But conceptually partition behaviour can be described in terms of a single IPNS writer per partition. Here we describe the single-writer case under two scenarios:
|
adding @aschmahmann to the conversation |
As we discussed in the bi-weekly meeting, it would be awesome if we could provide the persistence strategy for The strategies themselves could even live in different packages, like The persistence problem is a very complex one, and I think there might be different strategies for difference kind of applications, or even collaborations. Having this interface would allow us to easily test different approaches. Thoughts? |
@satazor I like this idea 👍 I think it would be good to also refactor the persistence service into a separate repo |
Considerations for a leadership election protocol:
I propose that when membership changes, the new leader is chosen by each node like so:
Note: For optimal censorship resistance, only the leader would know it is leader, and other nodes would know only that they are not the leader. This is a problem they are also trying to solve in Filecoin. If you can figure it out you get $200k :) |
@dirkmc It would be nice to not require interaction to elect a leader indeed, but if you don't require quorum over membership changes, you can get split brains. |
@pgte when Alice finds that Bob is unreachable, Alice will evict Bob from her membership list and gossip that out to other peers in urgent mode, right? I was imagining that the new leader would not start persisting for some minimum time based on the gossip frequency heuristic. In the scenario where there's a collaboration partition but not a DHT partition, the peers on each side of the partition will form two groups, where the members of each group will elect a leader. The two leaders will overwrite each other's IPNS entries. I don't think there's any way of getting around that unless we have multi-writer IPNS. Because IPNS publish is slow (up to 60s for a publish to propagate), I feel that it may not be feasible or scalable to have all peers writing to IPNS, which is why I think leadership election may produce a more consistent (but not perfect) outcome. What do you think? |
I agree with your point, but my point was that if the leader got (really) elected interactively, no split brains would occur, and thus no overwriting of IPNS by a minority. OTOH, I don't want to overcomplicate the protocol.. |
As @pgte pointed out, when there is a collaboration partition but not an IPNS partition, one leader will be elected on each side of the partition and they will overwrite each other's IPNS entries (the "split brain" scenario). Protocols like Raft have a single leader that takes care of adding and removing members, and leaders must be elected by a majority of members, so there is no possibility of a split brain. However Raft is designed for scenarios in which members come and go in an orderly manner, whereas peer-star will need to support ad-hoc collaborations where groups of members may come and go in a less predictable fashion, and without warning (eg when a user closes their browser). One solution is to have two collaboration IPNS keys:
IPNS currently only supports RSA keys, which are very big, so having to pass around two keys per collaboration is undesirable. Another solution is for the leader to simply poll the HEAD IPNS key to check if it gets overwritten by someone else, indicating a partition. In this scheme the leader should indicate liveness by periodically updating the HEAD IPNS key, even if there are no collaboration state changes (eg by incrementing a sequence number):
@pgte does that sound like the right approach? Do you think we need Raft-style election by majority for this solution or would non-interactive election suffice? |
I think that one IPNS record will do the trick, as long as you are able to keep track of the leader inside the IPNS value. Indeed, Raft membership changes have to be orderly, which may not happen in a p2p scenario, where a bunch of replicas leaving at the same time may leave the cluster unable to progress in persisting. So yes, I think that split brains may be inevitable here, and that a best-effort IPNS entry could be the solution. If there are nodes alive on each side of the split, they will have a different world view for a while, and they will be overwriting each others persistence. Since the persistence can diverge now, if IPNS has live update feed we can track the "Saved" state on all peers for feedback regarding this. |
That makes sense. Hopefully it should be rare that there is a network partition that splits the collaboration but does not split IPNS. In an ad-hoc network the non-interactive election I suggested doesn't really make sense, because the list of members will change frequently, so instead I think a Raft style election where peers can be in follower / candidate / leader states would actually work better. |
I wanted to think through how a Raft style leader election would work concomitantly with peer-star membership gossip in a few different scenarios. The first four scenarios are similar to how Raft already works, and the fifth scenario demonstrates how to deal with a membership change: In scenario 5, membership changes while voting is already underway. In the example above, the new member (Eve) sees that there is an incomplete vote going on, so she increments the epoch number and votes for another peer. When the other peers see that the epoch number has changed, they vote on the leader for the next epoch. Similarly if a peer detects that a member has gone offline, that peer should increment the epoch number and vote for themselves as leader. In the general case, when a quorum cannot be formed, or membership changes, a new epoch of voting should be initiated. |
I've created a separate WIP PR against this branch with an implementation of leadership, so that this PR doesn't get too big: dirkmc#1 |
@dirkmc would you like me to merge this into a new peer-star-app branch? |
@pgte are you asking about creating a feat/persister branch on peer-star-app and merging dirkmc:peer-star-app/feat/persister into it? I'm not sure I see the advantage? |
FYI I have written an implementation of leadership. Today I wrote several test scenarios for leadership. Next I need to
|
I have completed the tests for leadership and have been testing out leadership persistence with a real-world example app. Some issues I've come across:
|
@dirkmc Let's see if I understand the issue: |
@pgte that's a good point. Do you have a sense for how reliable real-world connections are? If they are very reliable that should work well. If they are very unreliable it could cause a lot of churn if leaders are constantly changing. |
@dirkmc I don't think that should be a big concern. I'm more concerned about peers voluntarily closing the connection (like when the user closes the app tab), but without an orderly close.
|
@pgte that makes sense. It occurs to me that if a leader is on an unreliable internet connection it's probably not such a bad thing if it ends up being voted out. We may be able to use the pubsub messages, and possibly heartbeats over peer-to-peer connections, to develop a heuristic to determine how stable a peer's connection is. I'm going to do some investigation to understand how reliably we can detect scenarios such as
Hopefully the transport layer can take care of some of this for us. By the way I noticed that it can sometimes take a few minutes for a peer to start receiving messages from pubsub, so I'm going to look into that too |
Some things with pubsub I've noticed that I need to investigate further:
|
I found that
@pgte there's a couple of things I'm not sure about here:
|
@dirkmc b) Yes, our current pubsub implementation (floodsub) only knows about the topics a peer is interested in after connecting to it.. |
@pgte when the local peer is informed by the rendezvous server that a remote peer has connected to rendezvous, the local peer connects to the remote peer. |
@dirkmc when discovering a new node from the underlying transport, peer-star discovery eventually connects to that peer so that the pubsub topics we understand the pubsub topics they're interested in. If the remote peer is not interested in the app topic, it disconnects. If it's connected to the app topic, it adds that peer to the ring of app peers, and if that peer is part of the Dias Set of peers for the app, it stays connected to it. Otherwise, it disconnects. As I think you suggested, you propose replacing this connect-and-poll-pubsub-topics strategy with trying a direct connection to a new protocol to inquire about interest in participating in the app? |
@pgte yes that's what I was thinking - seeing as we're connecting to the peer anyway, maybe we can avoid polling by simply asking it it's interested in our topic |
@dirkmc makes sense. |
I should also explain the context: when leadership starts up, it waits for a certain amount of time, and if it doesn't hear from another peer, it assumes it is the only peer and elects itself as leader. So I want to minimize the amount of time it should have to wait by minimizing gossip startup. I will write a PR and try it out with a real-world example app |
I should also mention that I noticed that sometimes when two peers start at the same time, they seem to miss each other, and instead of taking several seconds to discover each other they take several minutes, so I want to make sure we avoid that problem. |
@pgte it turns out that floodsub already does exactly what we need: when floodsub discovers a remote peer (by listening for libp2p When the local peer receives a list of topics from a remote peer, floodsub updates its local cache of topics that the remote peer is interested in. If floodsub were to emit an event as well, we could listen for that event in Discovery instead of polling. Note also that if the peer is discovered through another mechanism (eg when floodsub dials the list of peers in the peerbook), Discovery only finds out the topics that the new peer is interested when Discovery (eventually) tries to poll it. If there were an event, Discovery would find out immediately. Do you think we should submit a PR to js-libp2p-floodsub to emit an event when it receives new subscription information for a peer? Note that there are a couple of complicating factors:
|
@jacobheun I have a question about web socket star and rendezvous. It is my understanding that when a peer joins the rendezvous server it is added to a peer list. The rendezvous server broadcasts the list out to all listeners every 10s, is that correct? Is there any way that a new peer can receive the list of other peers as soon as it joins the rendezvous server, rather than waiting (up to 10s) for the next broadcast? |
@dirkmc correct, it's through floodsub subscriptions that a peer knows if another peer is interested in an app. Replacing polling with an event would make things more efficient. 👍 |
With the changes I've made in the above PRs in floodsub, websocket-star and websocket-star-rendezvous, the time between starting up and discovering a peer has gone from around 15 seconds to around 400ms: Note that the latency is near zero because rendezvous is running on my localhost, but even with rendezvous on a remote server I don't expect it to take more than a second. |
@dirkmc that's awesome! |
The websocket-star updates have been released and I put in a ticket with infra to get the rendezvous server deployed, ipfs/infra#458. |
Great! Thanks @jacobheun |
ws-star.discovery.libp2p.io @0.3.0 is up and running 🚀 |
An implementation of a persistence service as discussed in #57
Stores a linked list in IPFS and a pointer to the head of the list in IPNS
New deltas are added as they come in to the store and snapshots are taken after a certain interval or after a certain number of deltas.
TODO:
implement shared persistence
IPNS does not yet use the DHT, so the above only works if the IPFS Repo is shared.
Therefore we may need to
more tests
implement master election
IPNS is single-write, so we need to implement a protocol to