Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1697 ⁃ Make Bootstrap Operations Deterministic #381

Closed
bradfordcp opened this issue Jul 26, 2022 · 8 comments · Fixed by #403
Closed

K8SSAND-1697 ⁃ Make Bootstrap Operations Deterministic #381

bradfordcp opened this issue Jul 26, 2022 · 8 comments · Fixed by #403
Assignees
Labels
enhancement New feature or request zh:In-Progress

Comments

@bradfordcp
Copy link
Member

bradfordcp commented Jul 26, 2022

What is missing?
When adding nodes to a cluster or creating a new cluster we can end up in a state where we do not bootstrap nodes in a balanced manner. This is possible when the number of nodes bootstrapping is greater than the number of racks.

Ideally we would sort the set of nodes to bootstrap by rack. Then the rack with the most nodes to bootstrap would have a node bootstrap. If there are any ties for the rack with the most nodes to bootstrap we should sort the racks by name.

Why do we need it?

Data can be skewed if we don't maintain balance during bootstrap operations. This is less of a concern with vnode clusters. When utilizing single-token we may end up with multiple replicas on a single node

Environment

  • Cass Operator version: v1.12.0

Anything else we need to know?:
Y'all are awesome, thanks for the great work 🌟

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1697
┆priority: Medium

@bradfordcp bradfordcp added the enhancement New feature or request label Jul 26, 2022
@sync-by-unito sync-by-unito bot changed the title Make Bootstrap Operations Deterministic K8SSAND-1697 ⁃ Make Bootstrap Operations Deterministic Jul 26, 2022
@adutra
Copy link
Contributor

adutra commented Aug 17, 2022

Hey team! Please add your planning poker estimate with ZenHub @burmanm @adejanovski @Miles-Garnsey

@adutra
Copy link
Contributor

adutra commented Sep 7, 2022

A few data points to document what I'm seeing and what I'm going to modify:

The distribution of pods in racks is deterministic and balanced already. It's deterministic because the distribution is done in CalculateRackInformation and respects the order in which racks were declared in the spec. It's balanced because the algorithm forbids for a given rack to get more than one pod more than any other rack.

However the order in which Cassandra processes are started is not deterministic:

  • First we call startOneNodePerRack; here we do respect the balance by starting only one node in each rack, but this node can be any of the pods;
  • Then we call startAllNodes; here, Cassandra processes are started in no particular order.

I am thus going to modify the above methods, and make them behave in a strictly deterministic order.

Note that this might slow down the overall startup time of the dc.

we should sort the racks by name

I'm rather inclined to respect the order in which racks were declared in the spec.

@adejanovski
Copy link
Contributor

I'm rather inclined to respect the order in which racks were declared in the spec.

My 2c here: since rack names cannot be changed on a live datacenter, but the ordering in the manifest could be changed, using names would probably be more "deterministic".

@adutra
Copy link
Contributor

adutra commented Sep 7, 2022

since rack names cannot be changed on a live datacenter, but the ordering in the manifest could be changed, using names would probably be more "deterministic".

I'm not sure here. Using the rack names means that one given rack will always be the first one to bootstrap, and that cannot be changed. What if the user wants another rack to be the first one? Using rack names would force users to name their racks with alphabetically sorted names reflecting the bootstrap order, which may be tricky depending on how they name them. E.g. rack10 would bootstrap before rack2. However using the declaration order allows users to choose which rack bootstraps first.

@adejanovski
Copy link
Contributor

As a user, I can't think of a use case where you'll need a specific rack to start first, just the order to be predictable. Do have a case in mind?
Racks are usually named with -a, -b, -c or -1, -2, etc... , so I think it should be expected from users that we use that order.

@adejanovski
Copy link
Contributor

Following up on our offline chat, we agree that as long as nodes get started by cycling through racks (vs starting several nodes in a single rack at once), how racks are ordered doesn't matter much.
Hence, using the spec order is simpler than reordering the racks.

@jsanda
Copy link
Contributor

jsanda commented Sep 7, 2022

+1 to using declaration order to determine which rack bootstraps first.

@bradfordcp
Copy link
Member Author

I’m fine with either approach as long as it’s deterministic and documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request zh:In-Progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants