Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens if --join host goes down? #320

Closed
jseldess opened this issue May 25, 2016 · 5 comments
Closed

What happens if --join host goes down? #320

jseldess opened this issue May 25, 2016 · 5 comments
Labels
O-external Origin: Issue comes from external users.

Comments

@jseldess
Copy link
Contributor

From Gitter:

Lorenzo Boccaccia @LorenzoBoccaccia 03:29
hi guys, I've some question about deployment and disaster recovery.. when you start instances you --join them to some host; what happen if that particular host goes down? and what happen if the instance are to be restarted, but the primary --join target is still down?

marc @mberhault 05:06
once a machine is connected to the gossip network (basically, once it has joined the cluster), it stores the addresses of other nodes seen, so you could restart without --join, and it would still be ok

Lorenzo Boccaccia @LorenzoBoccaccia 05:07
ah ok perfect. and when one cluster element dies, how does replication react? at which point the replica of the zone impacted are replicated toward different nodes form the available copies?

marc @mberhault 05:09
we consider a node to be dead if it's been unreachable for five minutes. At that point, we'll add another node as a replica and start copying data over

Lorenzo Boccaccia @LorenzoBoccaccia 05:11
wonderful, cheers!

@jseldess jseldess added the O-external Origin: Issue comes from external users. label May 25, 2016
@jseldess jseldess added this to the Q2 milestone May 25, 2016
@mberhault
Copy link
Contributor

We should probably document some recommendations.

For now, the only requirements are:

  • a node with no date in its store will initialize a new cluster if --join is empty (or unspecified)
  • to join an existing cluster, a node must be given at least one other node in the --join flag. If the target is part of the cluster, the new node will join as well. It's perfectly acceptable to pass more than one target through the --join flag.
  • once a node has connected to an existing cluster, it will locally store the list of nodes already in the cluster and use that list on subsequent startups.

This means that the very first node is no longer special once another node has joined the cluster.

@jseldess jseldess modified the milestones: Q2, Q3 Jul 6, 2016
@jseldess
Copy link
Contributor Author

@mberhault, since each new node stores the list of nodes already in the cluster, why is it ever necessary to pass more than one address in the --join flag?

@mberhault
Copy link
Contributor

We don't technically need to as long as the --join host is up the first time a new node connects.
However, for simplicity, I've been putting all hosts in the --join flag in the test clusters. That way, I don't have to worry about a single node being down, the network being funky, or anything else.

@bdarnell
Copy link
Contributor

It's never necessary to give more than one --join address, but it's useful in environments where new nodes can be started automatically but it's inconvenient to change the command line. (this may sound like an unusual scenario, but google's borg works this way). You can set the command line up with multiple hosts so that new nodes will be able to find someone to talk to even if the original node has gone away.

@jseldess jseldess removed this from the Q3 milestone May 13, 2017
@jseldess
Copy link
Contributor Author

Closing in favor of #3395.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-external Origin: Issue comes from external users.
Projects
None yet
Development

No branches or pull requests

3 participants