-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the use and behavior of the --join flag #3395
Comments
@bdarnell do you (1) agree that we should update the docs in this spirit, and if so (2) have a recommendation for what we should tell users about why they should avoid doing more than 3-5 nodes in the |
|
I understand the --join flag is mandatory since 19.2 (per doc) and it's a good thing. Since a node joining a cluster needs to find just 1 node already in the cluster (from general raft standpoint, not our implementation knowledge) and since we need redundancy, it seem to make sense in a multi-region setup to include 2 nodes from each region (presuming enough nodes and cluster configured to survive a region failure). Speaking more generally... the redundancy level is determined [by user] based on examination of all failure permutations/scenarios they want the cluster to tolerate. Shouldn't the --join list be constructed based on exactly same exercise? Pardon a trivial point - every operation in the cluster has to be at the same level of FT to eliminate the weakest link. It's somewhat counter-intuitive why limiting #nodes in --join would be a good practice, perhaps it's our implementation caveat. The latter, and more general question - what's a good order / pattern of entries in the --join list (doesn't matter, interleaving in some fashion, my-region first, other) - begs understanding how we walk the --join list. E.g. always start with first on the list or perhaps look at all entries an pick ones to connect first on the same subnet? And if fails to connect to an existing node in --join list - what's the timeout before next is tried? |
Including two nodes per region is my recommendation as well (for multi-region clusters. For single-region, adding a few more nodes (3-5 total) might make sense. Currently it looks like we walk the join list in a random order, trying one address per second. This is not good for the first-time startup of a large cluster - it takes a while for all the nodes to converge onto the same gossip network (we saw some real-world issues with this a long time ago). So this is why we recommend limiting the number of addresses used here (there are certainly things we could do here to make it work better with large join lists, but using fewer join addresses is a much simpler workaround) |
@bdarnell I think our multi-region Kubernetes docs currently set 3 |
There's not a precise numeric requirement here; 2 vs 3 nodes per region won't make a difference. The general guidelines are:
For a 9-node multi-region cluster, you're on the edge of "small". It would be fine to just put all 9 in the join flag as long as you stay that size, but as you scale you'll probably want to stop doing that so it's probably a good idea to limit yourself to 2 nodes per region at the start. |
Closed with #7893. |
We need to clarify:
Why do we recommend specifying 3-5 join addresses?
What is the best practice for
--join
lists in a multi-region cluster? Include a few nodes from each region?Why is it bad to specify too many nodes in
--join
? (see below)What to do if you don't know fixed IP addresses on startup?
On restart, isAs of 19.2,--join
required (no) and why not?--join
is always required when usingcockroach start
.This should explain how, once a node has connected to an existing cluster, it gets the list of nodes already in the cluster via gossip, stores the list locally, and uses that list on subsequent startups.When is a new cluster initialized on node startup. And what exactly happens during initialization.
cockroach init
docs to include more detail from the init RFC.Prior to 20.1, if you run
cockroach start
without a--join
flag, and it does not find the bootstrapping info it needs on
disk, it will create a brand new cluster.
cluster on node startup even if you restart it without a
--join
flag.In 20.1+
cockroach start
without--join
will no longercreate new clusters, once
cli: remove auto-init with
cockroach start
without--join
cockroach#44112 is merged.If you point a
--join
flag at a load balancer, it's not good,for the following reasons:
Using a LB will defeat CockroachDB's internal algorithm to
minimize the number of peer-to-peer connections for gossip.
It will create a dependency cycle between the load balancer
configuration and the database, which is not good.
Previous description:
In
Start a Node
we do mention that
However, apparently some weird/bad behavior can occur related to the
resulting gossip network graph. A doc update that fixes this issue
should probably:
Make it clear that you shouldn't/don't need to do this (add every
node on the
--join
flag)Clarify why not/what that weird/bad behavior is that can result
(at a high-ish level ideally)
Related:
--join
should use node's advertised addresses #2703The text was updated successfully, but these errors were encountered: