Clarify the use and behavior of the --join flag #3395

rmloveland · 2018-07-17T19:38:17Z

We need to clarify:

Why do we recommend specifying 3-5 join addresses?
- This should answer what happens when one of the join target nodes goes down or can't be reached (What happens if --join host goes down? #320), i.e., no problem; we just try the next node in the list. From @bdarnell: "it's useful in environments where new nodes can be started automatically but it's inconvenient to change the command line. (this may sound like an unusual scenario, but google's borg works this way). You can set the command line up with multiple hosts so that new nodes will be able to find someone to talk to even if the original node has gone away."
What is the best practice for --join lists in a multi-region cluster? Include a few nodes from each region?
Why is it bad to specify too many nodes in --join? (see below)
What to do if you don't know fixed IP addresses on startup?
- Use DNS aliases instead - see Manual Deployment (Insecure) Doc Update #2144.
~~On restart, is --join required (no) and why not?~~ As of 19.2, --join is always required when using cockroach start.
- This should explain how, once a node has connected to an existing cluster, it gets the list of nodes already in the cluster via gossip, stores the list locally, and uses that list on subsequent startups.
When is a new cluster initialized on node startup. And what exactly happens during initialization.
- I think this is probably already clear in the docs, but we should verify.
- Ideally, this would cover updates to the cockroach init docs to include more detail from the init RFC.
Prior to 20.1, if you run cockroach start without a --join
flag, and it does not find the bootstrapping info it needs on
disk, it will create a brand new cluster.
- If it does find the info on disk, the node will rejoin a
  cluster on node startup even if you restart it without a
  --join flag.
In 20.1+ cockroach start without --join will no longer
create new clusters, once
cli: remove auto-init with cockroach start without --join cockroach#44112 is merged.
If you point a --join flag at a load balancer, it's not good,
for the following reasons:
- Using a LB will defeat CockroachDB's internal algorithm to
  minimize the number of peer-to-peer connections for gossip.
- It will create a dependency cycle between the load balancer
  configuration and the database, which is not good.

Previous description:

In
Start a Node
we do mention that

When starting a multi-node cluster for the first time, set this flag
to the addresses of 3-5 of the initial nodes. Then run the
cockroach init command …

However, apparently some weird/bad behavior can occur related to the
resulting gossip network graph. A doc update that fixes this issue
should probably:

Make it clear that you shouldn't/don't need to do this (add every
node on the --join flag)
Clarify why not/what that weird/bad behavior is that can result
(at a high-ish level ideally)

rmloveland · 2018-07-17T19:47:53Z

@bdarnell do you (1) agree that we should update the docs in this spirit, and if so (2) have a recommendation for what we should tell users about why they should avoid doing more than 3-5 nodes in the --join args?

bdarnell · 2018-07-17T19:55:03Z

Yeah, we should probably update the docs.
We recommend using a limited number of nodes in the join flag because the more join hosts are given, the longer it can take the cluster to reach a stable state on startup. By limiting the join flag to a few hosts, those hosts can act as brokers to distribute the rest of the addresses and help the cluster get into a connected and balanced state faster. Note that it is important that the same join hosts be given to every node; if each node is given a different set of join hosts then the cluster may not be able to converge on a stable configuration.

jseldess · 2020-07-23T17:57:14Z

@taroface, @johnrk, this is an older issue that was mis-assigned. I think we should address this one soonish.

a-entin · 2020-07-23T18:22:00Z

I understand the --join flag is mandatory since 19.2 (per doc) and it's a good thing. Since a node joining a cluster needs to find just 1 node already in the cluster (from general raft standpoint, not our implementation knowledge) and since we need redundancy, it seem to make sense in a multi-region setup to include 2 nodes from each region (presuming enough nodes and cluster configured to survive a region failure). Speaking more generally... the redundancy level is determined [by user] based on examination of all failure permutations/scenarios they want the cluster to tolerate. Shouldn't the --join list be constructed based on exactly same exercise? Pardon a trivial point - every operation in the cluster has to be at the same level of FT to eliminate the weakest link.

It's somewhat counter-intuitive why limiting #nodes in --join would be a good practice, perhaps it's our implementation caveat.

The latter, and more general question - what's a good order / pattern of entries in the --join list (doesn't matter, interleaving in some fashion, my-region first, other) - begs understanding how we walk the --join list. E.g. always start with first on the list or perhaps look at all entries an pick ones to connect first on the same subnet? And if fails to connect to an existing node in --join list - what's the timeout before next is tried?

bdarnell · 2020-07-23T18:35:40Z

Including two nodes per region is my recommendation as well (for multi-region clusters. For single-region, adding a few more nodes (3-5 total) might make sense.

Currently it looks like we walk the join list in a random order, trying one address per second. This is not good for the first-time startup of a large cluster - it takes a while for all the nodes to converge onto the same gossip network (we saw some real-world issues with this a long time ago). So this is why we recommend limiting the number of addresses used here (there are certainly things we could do here to make it work better with large join lists, but using fewer join addresses is a much simpler workaround)

taroface · 2020-07-30T20:57:30Z

Including two nodes per region is my recommendation as well (for multi-region clusters.

@bdarnell I think our multi-region Kubernetes docs currently set 3 --join addresses per cluster, corresponding to the 3 pods created by each StatefulSet. Will this be OK, or does the --join flag need to have just 2 addresses per region in this case?

bdarnell · 2020-07-31T16:31:42Z

There's not a precise numeric requirement here; 2 vs 3 nodes per region won't make a difference. The general guidelines are:

Use more than one node
Unless your cluster is small, don't list all the nodes
Pick nodes that are spread out across failure domains
Use the same list for all nodes' join flags

For a 9-node multi-region cluster, you're on the edge of "small". It would be fine to just put all 9 in the join flag as long as you stay that size, but as you scale you'll probably want to stop doing that so it's probably a good idea to limit yourself to 2 nodes per region at the start.

taroface · 2020-08-06T15:35:33Z

Closed with #7893.

jseldess changed the title ~~Clarify that specifying too many nodes in the --join flag is not good~~ Clarify the use and purpose of the --join flag Sep 21, 2018

This was referenced Sep 21, 2018

What happens if --join host goes down? #320

Closed

node join behaviour & restarts #3138

Closed

jseldess changed the title ~~Clarify the use and purpose of the --join flag~~ Clarify the use and behavior of the --join flag Sep 21, 2018

jseldess mentioned this issue Sep 21, 2018

Production recommendation: include multiple node addresses in --join #739

Closed

jseldess added unclear-info T-missing-info labels Sep 21, 2018

This was referenced Sep 21, 2018

Add more detail to "Initialize a Cluster" from the cockroach init RFC #2756

Closed

Manual Deployment (Insecure) Doc Update #2144

Closed

jseldess added T-incorrect-or-unclear-info and removed C-unclear-info labels Oct 4, 2018

jseldess added O-external Origin: Issue comes from external users. A-ops&tools labels Nov 9, 2018

lnhsingh added the P-3 Low priority; nice-to-have label Nov 14, 2018

rmloveland self-assigned this Jan 30, 2020

jseldess assigned taroface and unassigned rmloveland Jul 23, 2020

jseldess added A-deploy-ops and removed A-io P-3 Low priority; nice-to-have labels Jul 23, 2020

taroface mentioned this issue Jul 30, 2020

update --join flag documentation #7893

Merged

taroface mentioned this issue Aug 3, 2020

cloud: add statefulset config for EKS multi-region cockroachdb/cockroach#51775

Merged

taroface closed this as completed Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the use and behavior of the --join flag #3395

Clarify the use and behavior of the --join flag #3395

rmloveland commented Jul 17, 2018 •

edited by taroface

Loading

rmloveland commented Jul 17, 2018

bdarnell commented Jul 17, 2018

jseldess commented Jul 23, 2020

a-entin commented Jul 23, 2020 •

edited

Loading

bdarnell commented Jul 23, 2020

taroface commented Jul 30, 2020

bdarnell commented Jul 31, 2020

taroface commented Aug 6, 2020

Clarify the use and behavior of the --join flag #3395

Clarify the use and behavior of the --join flag #3395

Comments

rmloveland commented Jul 17, 2018 • edited by taroface Loading

rmloveland commented Jul 17, 2018

bdarnell commented Jul 17, 2018

jseldess commented Jul 23, 2020

a-entin commented Jul 23, 2020 • edited Loading

bdarnell commented Jul 23, 2020

taroface commented Jul 30, 2020

bdarnell commented Jul 31, 2020

taroface commented Aug 6, 2020

rmloveland commented Jul 17, 2018 •

edited by taroface

Loading

a-entin commented Jul 23, 2020 •

edited

Loading