core: support copysets or replication "teams" of servers #25194

rytaft · 2018-04-30T23:41:43Z

As described in this forum post, better fault tolerance for extremely large clusters can be achieved by creating replication "teams" of servers.

This phenomenon has been described in the literature as "copysets", and is summarized here: http://hackingdistributed.com/2014/02/14/chainsets/. Although copysets can be manually approximated today with our replication zones feature, we should consider explicitly supporting copysets in the future in order to better support large deployments.

Jira issue: CRDB-5725

BramGruneir · 2018-05-01T14:52:51Z

This is VERY old now, but copysets were discussed in the rebalancing RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20160503_rebalancing_v2.md#copysets

BramGruneir · 2018-05-01T15:04:42Z

Actually, after re-reading the rfc, I think a lot of that still holds true.

nvanbenschoten · 2018-05-24T18:36:12Z

@a-robinson following up on what I mentioned in the reading group - can this be supported in the static case where no nodes are added or removed by using zone configs? If zone configs support disjunction expressions then I think we could emulate the effect of copysets with the following two steps:

add a n<N> attribute to each node in a cluster
define replica constraints in zone configs as a disjunction of all acceptable copysets. For instance, we could define valid replica placements as: [[n1, n2, n3], [n4, n5, n6], [n7, n8, n9], [n1, n4, n7], [n2, n5, n8], [n3, n6, n9]] (not valid zone constraint syntax, of course).

If this all worked then we could create a simple tool that generates one of these copyset zone configs given a certain number of nodes and a desired scatter width.

cc. @nstewart

a-robinson · 2018-05-24T21:17:51Z

You could take those two steps to allow manual control over which copysets exist, but you'd still have to implement the rebalancing logic to actually respect the copysets and to try balancing ranges across them. And you'd be really tightly coupling the constraints/attributes on nodes with the constraints in the zone config by statically listing all the nodes in the zone config.

nvanbenschoten · 2018-05-24T21:26:37Z

you'd still have to implement the rebalancing logic to actually respect the copysets and to try balancing ranges across them

I'm not sure I follow, which is probably due to a lack of expertise in the area of zone config constraints and their effect on rebalancing decisions. Why wouldn't current rebalancing logic take the manual "copysets" into account? My understanding is that it would try to evenly distribute replicas across nodes randomly while still obeying these constraints. If that's the case then wouldn't a scatter probabilistically fill the copysets in a roughly even manner?

And you'd be really tightly coupling the constraints/attributes on nodes with the constraints in the zone config by statically listing all the nodes in the zone config.

Sure, this wouldn't work for all deployments, but this isn't really a blocker for a POC.

a-robinson · 2018-05-24T21:44:49Z

I didn't get you wanted to try to use the existing Constraints logic for it. That works in theory, but a disjunction of per-replica constraints is way beyond what the allocator implementation for Constraints currently supports (and even further beyond what we've seen any need to support).

github-actions · 2021-06-06T02:17:27Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

erikgrinaker · 2021-06-11T14:55:43Z

We don't need to support full copysets either, even a low number of node groups (e.g. via locality tags) significantly lowers the probability of range loss in large clusters. The analytical solution got hairy, but I wrote a small script to simulate it (rangeloss.go). When losing 2 nodes in a 50-node cluster with 3 replicas and 1000 ranges, the probabilities of losing at least 1 range are:

1 partitions: 92%
3 partitions: 67%
6 partitions: 34%
9 partitions: 23%
12 partitions: 17%

Many users already use 3 partitions due to cloud AZ locality tags, but there are significant gains to be had by adding a just a few more partitions.

I think it'd be very worthwhile to add this as one of the allocation heuristics -- of course weighed up against other concerns like load balancing.

UPDATE: The initial numbers here were too low due to a bug which biased range allocations. This also showed that simply adding 12 localities and allocating randomly doesn't work, there needs to be stronger coupling between ranges. The simulation above uses contiguous partition groups with the same size as the number of replicas -- i.e., with 3 replicas, ranges are allocated either to partitions 1-3, 4-6, 7-9, etc.

github-actions · 2023-09-25T11:10:02Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

rytaft added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community A-kv-replication Relating to Raft, consensus, and coordination. labels Apr 30, 2018

rytaft added this to the Later milestone Apr 30, 2018

petermattis removed this from the Later milestone Oct 5, 2018

bdarnell mentioned this issue May 26, 2021

kv,*: non-contiguous ranges #65726

Open

github-actions bot added the no-issue-activity label Jun 6, 2021

petermattis removed the no-issue-activity label Jun 6, 2021

jlinder added the T-kv KV Team label Jun 16, 2021

lunevalex added the A-kv-distribution Relating to rebalancing and leasing. label Oct 6, 2021

github-actions bot added the no-issue-activity label Sep 25, 2023

erikgrinaker removed the no-issue-activity label Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: support copysets or replication "teams" of servers #25194

core: support copysets or replication "teams" of servers #25194

rytaft commented Apr 30, 2018 •

edited by cockroach-jira-scripts

Loading

BramGruneir commented May 1, 2018

BramGruneir commented May 1, 2018

nvanbenschoten commented May 24, 2018

a-robinson commented May 24, 2018

nvanbenschoten commented May 24, 2018

a-robinson commented May 24, 2018

github-actions bot commented Jun 6, 2021

erikgrinaker commented Jun 11, 2021 •

edited

Loading

github-actions bot commented Sep 25, 2023

core: support copysets or replication "teams" of servers #25194

core: support copysets or replication "teams" of servers #25194

Comments

rytaft commented Apr 30, 2018 • edited by cockroach-jira-scripts Loading

BramGruneir commented May 1, 2018

BramGruneir commented May 1, 2018

nvanbenschoten commented May 24, 2018

a-robinson commented May 24, 2018

nvanbenschoten commented May 24, 2018

a-robinson commented May 24, 2018

github-actions bot commented Jun 6, 2021

erikgrinaker commented Jun 11, 2021 • edited Loading

github-actions bot commented Sep 25, 2023

rytaft commented Apr 30, 2018 •

edited by cockroach-jira-scripts

Loading

erikgrinaker commented Jun 11, 2021 •

edited

Loading