-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: support copysets or replication "teams" of servers #25194
Comments
This is VERY old now, but copysets were discussed in the rebalancing RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20160503_rebalancing_v2.md#copysets |
Actually, after re-reading the rfc, I think a lot of that still holds true. |
@a-robinson following up on what I mentioned in the reading group - can this be supported in the static case where no nodes are added or removed by using zone configs? If zone configs support disjunction expressions then I think we could emulate the effect of copysets with the following two steps:
If this all worked then we could create a simple tool that generates one of these copyset zone configs given a certain number of nodes and a desired scatter width. cc. @nstewart |
You could take those two steps to allow manual control over which copysets exist, but you'd still have to implement the rebalancing logic to actually respect the copysets and to try balancing ranges across them. And you'd be really tightly coupling the constraints/attributes on nodes with the constraints in the zone config by statically listing all the nodes in the zone config. |
I'm not sure I follow, which is probably due to a lack of expertise in the area of zone config constraints and their effect on rebalancing decisions. Why wouldn't current rebalancing logic take the manual "copysets" into account? My understanding is that it would try to evenly distribute replicas across nodes randomly while still obeying these constraints. If that's the case then wouldn't a scatter probabilistically fill the copysets in a roughly even manner?
Sure, this wouldn't work for all deployments, but this isn't really a blocker for a POC. |
I didn't get you wanted to try to use the existing |
We have marked this issue as stale because it has been inactive for |
We don't need to support full copysets either, even a low number of node groups (e.g. via locality tags) significantly lowers the probability of range loss in large clusters. The analytical solution got hairy, but I wrote a small script to simulate it (rangeloss.go). When losing 2 nodes in a 50-node cluster with 3 replicas and 1000 ranges, the probabilities of losing at least 1 range are:
Many users already use 3 partitions due to cloud AZ locality tags, but there are significant gains to be had by adding a just a few more partitions. I think it'd be very worthwhile to add this as one of the allocation heuristics -- of course weighed up against other concerns like load balancing. UPDATE: The initial numbers here were too low due to a bug which biased range allocations. This also showed that simply adding 12 localities and allocating randomly doesn't work, there needs to be stronger coupling between ranges. The simulation above uses contiguous partition groups with the same size as the number of replicas -- i.e., with 3 replicas, ranges are allocated either to partitions 1-3, 4-6, 7-9, etc. |
We have marked this issue as stale because it has been inactive for |
As described in this forum post, better fault tolerance for extremely large clusters can be achieved by creating replication "teams" of servers.
This phenomenon has been described in the literature as "copysets", and is summarized here: http://hackingdistributed.com/2014/02/14/chainsets/. Although copysets can be manually approximated today with our replication zones feature, we should consider explicitly supporting copysets in the future in order to better support large deployments.
Jira issue: CRDB-5725
The text was updated successfully, but these errors were encountered: