Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rfc for collaborative pinsets #467

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions docs/collaborative-pinsets-rfc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Collaborative pinsets

This document outlines a possible implementation plan for "collaborative pinsets" as described below, within the scope of IPFS Cluster.

## Definition

A **collaborative pinset** is a collection of CIDs which is pinned by a number of IPFS Cluster peers which:

* Trust one or several peers to publish and update such pinset, but not others
* May freely participate or stop participating in the cluster, without this affecting
other peers or the pinset

## State of the art in IPFS Cluster

IPFS Cluster currently supports pinsets in a trusted environment where every node, once participating in the cluster has full control of the other peers (via unauthenticated RPC). The cluster secret (pnet key) ensures that only peers with a pre-shared key can request joining the cluster.

Maintenance of the peer-set is performed by the **consensus component**, which provides an interface to 1) The maintenance of the peer-set 2) The maintenance of the pinset (*shared state*).

Other components of cluster are independent from these two tasks, and provide functionality that will be useful in scenarios where the peer-set and pin-set maintenance works in a different manner:

* A state component provides pinset representation and serialization
* An ipfs connector component provides facilities for controlling the ipfs daemon (and the proxy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy is a separate component now

* A pintracker component provides functionality to ensure the shared state is followed by ipfs
* A monitoring component provides metric collection.

Thus, just replacing the consensus layer (with some caveats ellaborated below) is a relatively simple approach to support collaborative pinsets.

## A new consensus layer for collaborative pinsets

### State of the art of the consensus component

The consensus component is defined by the following interface:

```go
type Consensus interface {
Component
Ready() <-chan struct{}
LogPin(c api.Pin) error
LogUnpin(c api.Pin) error
AddPeer(p peer.ID) error
RmPeer(p peer.ID) error
State() (state.State, error)
Leader() (peer.ID, error)
WaitForSync() error
Clean() error
Peers() ([]peer.ID, error)
}
```

This is essentially a wrapper of the go-libp2p-consensus which adds functionality and utility methods which until now applied to raft (provided by go-libp2p-raft).

The purpose of Raft is to maintain a shared state by providing a distributed append-only log. In a Raft cluster, the log is maintained by an elected Leader, compacted and snapshotted convieniently. IPFS Cluster has spent significant efforts in detaching the state representation from raft, and allowing transformations (upgrades) to run indepedently, based solely on the "state component".

### A new consensus component using a IPFS-backed CRDTs

In order to be consistent with how components are meant to interact with each-other, it makes sense to implement the shared state maintenance in a new `go-libp2p-consensus` implementation. The main pain points in a collaborative cluster will be:

* Security: all peers should not be able to control or influence the behaviour of all other peers (including updating the state). For this, we will introduce the notion of a *trusted peerset* along with RPC endpoint authorization.
* Scalability: this consensus layer should support very large number of peers, efficient broadcast of state updates. For this we will introduce a CRDT-based "consensus" layer (it's not consensus per-se) and rely on improved Pubsub mechanisms as provided by libp2p.

### Security: Authentication and authorization for collaborative pinning

In a collaborative pinset scenario, we probably want to have a limited set of peers which are able to modify the shared state and freely connect to API endpoints from any other peers. We call this **trusted peerset**. This implies that we need to find ways to:
Copy link
Contributor

@kishansagathiya kishansagathiya Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan @lanzafame I just want to confirm my understanding,
A trusted peerset is defined for a given pinset, right? So, A cluster can have multiple trusted peersets based on pinsets.
One to one mapping between trusted peerset and pinset. Also a cluster can have multiple pinsets as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you define a trusted peerset first and pinset is defined by pins held by that peerset. i.e, pinset is a function of peerset.

Second one seems more appropriate to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one pinset per cluster (the shared state).


* Limit modifications of the state to the **trusted peerset**, being the rest of cluster peers just "followers".
* Limit the internal RPC API to the **trusted peerset**

We can address the first problem by signing state upgrades, allowing any peer to authenticate them (as needed). Libp2p pubsub allows sending signed messages so receiving peers can obtain the public key and verify the signatures.

For the second point, we have to consider the internal RPC API surface. Until now, it is assumed that all cluster peers can contact and use the RPC endpoints of all others. This is however very problematic as it would allow any peer for example to trigger ipfs pin/unpins anywhere. For this reason, we propose **authenticated RPC endpoints**, which will only allow a set of trusted peers to execute any of the RPC calls. This can be added as a feature of libp2p-gorpc, taking advantage of the authentication information provided by libp2p. Note, we will have to do UI adjustments so that non-trusted peers receive appropiate errors when they don't have rights to perform certain operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why we might need fine grained permissions for a pair of peer and method.

  • Peers which are part of the trusted peerset can call any RPC API to other members of the trusted peerset
  • But if a peer is not part of the trusted peerset you still want to let it call some RPC APIs, for example ID()

Copy link
Contributor

@kishansagathiya kishansagathiya Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I see it, we are trying to separate permissions for operations that just retrieves information and those that changes the state.

It seems that untrusted peers can't do much, since only thing they can do is to follow. What are the incentives of having a peer that just follows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the incentives of having a peer that just follows?

The same for running IPFS in general. Helping back up content and so on..



### Scalability

First, we need to address the **scalability of the shared state** and updates to it. For this we will store the state in IPFS so that every peer can retrieve it and copy it. Participating peers will regularly transmit CIDs pointing to the state so that every new peer can fetch it. Updates to the state will be broadcasted via pubsub. Because the state will be CRDT-based, all peers will eventually converge to the same version of it.

Secondly, we need to address the **scalability problem for inter-peer communications**: for example, sending metrics so that pins can be allocated, or retrieving the status of an item over a very large number of peers will be a problem. In a `pin everywhere` scenario though, where allocations (thus metrics) are not needed, this becomes much smaller. All in all, we should avoid that all peers connect to a single trusted peer (or connect to all other peers). Ideally, peers would be connected only to a subset or would be able to join the cluster by just `bootstrapping` to any other peer, without necessarily keeping permanent connections active to the trusted peers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, peers would be connected only to a subset...

Composite clusters would be a good solution to creating some form of hierarchy, whether it be based on grouping a few smaller cluster peers together to make them appear as a single bigger peer, or geographical proximity to each other, I am not sure yet, but however they are grouped it should reduce the communication overhead of the top level cluster peers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in fact a composite cluster topology where the subclusters are collaborative clusters with replication factor -1 would actually be the easiest way to proceed and solves a bunch of the problems (allocation, peerset management etc). It adds some management overhead and single points of failure (the trusted nodes of each subcluster). The latter can be probably addressed with a load balancer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, if new random peers want to join, they would need to join one of the subclusters (preferably the one with less storage available), or a new subcluster would need to be created, depending on how space/wanted replication factor is managed. So more management overhead.


There are a few places in cluster where all the peers in the peerset are contacted in order to fetch information (PeerAdd, ConnectGraph etc), but the critical one will be the broadcasts corresponding to Status, Sync and Recover operations. All these can be optimized by:

a) Limiting requests to the peers that have content allocated to them when dealing with a single CID
b) Supporting iterative (rather than parallel) checking for global state calls
c) Use libp2p connection manager to drop connections when they become grow too numerous (same as ipfs)

All these, however, will need to be battletested. On the plus side, the three operations are not critical to the central problem, which is having many peers follow a pinset.

Third, we need to address the **scalability problem for maintaining the peerset**. For this we can take advantage of pubsub-based monitoring. Listing all peers for which we have received a metric over pubsub will provide an effective peerset. Thus, the consensus layer can rely on the monitoring component for `Peers()`. Using pubsub ensures that metrics are distributed among the swarm in a scalable fashion, without the need for all peers to keep connections open to the *trusted peerset*.


### Dealing with malicious actors

There are several ways that a malicious peer might try to interfere with the activity of a collaborative cluster. In general, we should aim to have a working cluster when a majority of the peers in it are not malicious.

* Having a **trusted peerset** makes it an easy target for DDoS attacks on the swarm.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your trusted peerset is big enough such that DOSing the whole thing is expensive then DOS resistance is built in to an extent with secret leader elections that fit in well with some possible blockchain consensus mechanisms.

Blanket DOS resistance for the whole peerset seems like one of those things that no level of protocol design can fully protect against.

* A bad peer can pretend to be pinning something but not do it
* A bad peer can pretend to have infinite space and be always allocated things

Many of these problems (and others) have been (or will be) addressed by Filecoin. It is not IPFS Cluster goal to overlap with Filecoin in this feature, but rather to offer a simple way to "follow" a pinset which has less overhead and requirements than the current Raft consensus (in the future it will be good to study how Filecoin can be used as the consensus layer for cluster). For the moment:

* We will recommend high replication factors which make it less likely that all allocations for a Cid belong to malicious actors
* We will have to change the allocator strategy to something that randomly select a subset of peers first, and then applies the strategy. This should as well make it less likely for malicious peers to always appear in front of the allocation candidates.
* We will have to adapt the actual replication factor dynamically: as new peers join the cluster, we'd want to allocate content to them. It helps thinking in percentages: i.e. a CID should be allocated to 10% of the peers. As peers grow, new peers should start tracking that content. Open questions around allocation strategies are discussed in the next section.

### Allocations: open problems

Currently the allocations for a Cid are part of the shared state and consist of a list of peer IDs, decided when the item was added.

In a scenario where peers come and go and come back, this strategy feels suboptimal (although it would work on principle). We should probably work on an allocator component which can efficiently track and handle allocations as peers come and go. For example, if the minimum allocation factor cannot be reached, cluster should still pin the items and track them as underpinned and, as new peers join, it should allocate underpinned items to them. As peers go away, cluster should efficiently identify which pins need re-allocation.

Perhaps the whole allocation format should be re-thought, allowing each of the peers to detect items which have not reached a high-enough replication factor and tracking them. Then these peers would inform the allocator that they are going to worry about the items. This is, again, a difficult problem which Filecoin has solved properly by providing a market where peers can offer to store content.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a useful differentiator between this project and filecoin is the heightened trust among participants. It's probably best to focus on uses that make use of higher trust assumptions to avoid FC overlap.



## Prototype for the consensus compoenent

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this talk of trusted peersets and using blockchains for scalability is interesting to me. Could you clarify precisely why you want to use blockchain consensus among the trusted peersets?

Essentially, blockchains are scalable a consensus mechanism to maintain a shared state which grows stronger with the number of peers participating in it.

I think this should be articulated more precisely before jumping into this so that we really understand the advantage of using a blockchain over traditional BA among the trusted peers.

I think you might be saying that you want the trusted peerset to have the ability to go offline sporadically and not affect availability and consistency. If that's the case you should consider looking at sleepy consensus which is an application of Nakamoto consensus in the classical setting with a group of trusted peers designed exactly for this purpose.

In this paper Shi and Pass (of course) show that all traditional consensus mechanisms fail to achieve consistency and liveness under a reasonable notion of sporadic participation. However Sleepy Consensus is provably secure in this model! Implementing would likely be a big bad effort, but there should actually be some modules coming out of FC development that could be really useful for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZenGround0 interesting paper, thanks

If we are to implement the state using CRDTs we should first define it as a CRDT type. The state is a map from CID to PinInfo.

In CRDTs, Maps are simply implemented as Sets of Keys but we should provide a consistent way to choose the right value when
there are several entries for the same key. For this we propose to use the optimized Add-Wins OR-Set from the delta-CRDTs paper, backing the deltas on IPFS in the same fashion as `ipfs-log`. This allows us to, for the moment,
skip the implementation of the per-object casual consistency algorithm. We will distributes the deltas using signed-pubsub-messages.
The current log head will be regularly broadcasted when the state does not receive any other updates.

The implementation will require storage of two separate sets of elements:

* The tagged set s with elements (peerID, counter, CID[+value])
* The causal context set c with elements (peerID, counter)

This will need key-value stores indexed by peerID+counter. We will additionally need a regularly CID-indexed `State` mapped
to the tagged set s. A priori, any modifications to S should be reflected in the `State`. The `State` can be used to store the
pin info which is prevalent from all that we have (we can base it on the counter).

Following this, let's see how the new consensus methods would look like:

* CRDT type:
* will carry a counter which must be persisted to disk and survice across sessions.
* Two maps (s,c) persisted to disk and indexed by "peerID+n".
* A reference to the state
* A reference to the latest ipfs HEADs
* It must subscribe to a pubsub topic (configurable)
* Upon receiving a message: check that it's signed by a trusted peer and that signature is valid
* IPFS block get the IPFS Cid and its parents until we have the block.
* What if we have the blocks in IPFS but we cleaned the state and we have not processed them?
* Keep track in a cache of processed block-CIDs?
* Deserialize deltas from the blocks we did not have
* Perform unions in the best way (probably enough to just apply in order).
* Store ipfs heads or the ipfs CID if branches did not diverge.

* LogPin:
* Generate [(peerID, n+1, api.Pin), (peerID, n+1)] delta. Store n+1. Each N should only be used once (lock).
* Update s and c sets. Update the State.
* Generate new ipfs block with previous heads as parents. Block put to ipfs. Update head to block CID.
* Broadcast delta+blockCID using signed-pubsub.

* LogUnpin:
* Generate [{}, c] delta by collecting all (peerID, n) associated to CID. (worth having an extra set indexed by CID? in memory?)
* Update s and c sets. Update the State.
* Generate new ipfs block with previous heads as parents. Block put to ipfs. Update head to block CID
* Broadcast using signed-pubsub

* AddPeer/RmPeer: no-ops!
* WaitForSync: no-ops. We will sync as we get information. We can have peers re-broadcast heads on intervals.
* Clean:
* trash everything that is on disk
* including the State (?). Not sure if this is consensus layer business.

* Peers: either return just [self] or pick the list of peers with valid pings from metrics.

Worth saying that crdt operations (subscription, logpin, logunpin) should be mostly atomic.



## UX in collaborative pinsets

Ideally it will have a super-easy setup:

1. Run the go-ipfs daemon
2. `ipfs-cluster-service init --template /ipns/Qmxxx`
3. `ipfs-cluster-service daemon`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

This will fetch configuration template with pre-filled "trusted" peers and the optimal configuration for the collaborative pinset cluster will be provided.