Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling from a single node to three nodes and keeping the data from the one node #1311

Closed
njbennett opened this issue Oct 16, 2015 · 10 comments

Comments

@njbennett
Copy link

Should this be possible? When we scale from a single server node, to three server nodes, we see the data disappear.

Our process for doing this is as follows:

  1. Stop the single node
  2. Change bootstrap_expect to 3 and change retry_join to the list of server addresses.
  3. Start the single node again.
  4. Start the second and then third nodes.

Once they have all started and synced, we try to read data and are unable to retrieve any data persisted before we scaled the cluster up.

@highlyunavailable
Copy link
Contributor

You shouldn't need to do any of that.

Your process can be:
Start the first node.
Start the second and third nodes. Join them to the first node either via retry_join or just consul join.

You are now done. Check out the docs about adding/removing servers for more information.

bootstrap_expect is only needed when you're starting a cluster for the first time from a clean slate.

When you stop the first node, are you sending SIGTERM or SIGINT (Ctrl-c) to it? If it's leaving, due to SIGINT it would be the last node leaving when you stop it the first time, which would of course delete all the data in the data dir since it's leaving the cluster.

@jpalermo
Copy link

I think that helps. Our deployments are automated, and each node knows what other nodes there are supposed to be, but not how many there were previously (or if a cluster even existed).

I think we should be able to have our startup script:
-Start consul and see if it connects to a previous cluster
-If not, try to join each of the known addresses in the cluster
-If that fails, it would mean this is a new cluster and we should set bootstrap_expect to initialize the cluster.

If there's any way to make that easier on ourselves, let us know.

We're using a SIGTERM to stop the nodes.

@highlyunavailable
Copy link
Contributor

Okay, so a SIGTERM should leave data in the directory since it's a shutdown-without-leave. It'd be worth checking the data-dir after you shut down the first node between steps 1 and 2 there to see if the IP address of node1 is still in the raft/peers.json file. If it is, then it's probably something to do with bootstrap-expect. I'm thinking there might be some strangeness being exposed because you're going from a single-node cluster to a 3-node cluster, and if you change expect between them you might be causing a "re-bootstrap" which would maybe clear all the data (normally a bootstrap should only happen on an empty data-dir/raft peer set.)

As far as your startup script goes: It sounds like you're regularly ending up with 0 nodes? The thing is, bootstrap is only ever run if join fails, so if you set join addresses and bootstrap_expect is set, if it can successfully join another server, it will never re-bootstrap. I think you could get the behavior you want by just pre-setting join addresses (without retry_join, so they can fail) and setting bootstrap_expect and nothing else.

@slackpad
Copy link
Contributor

slackpad commented Jan 8, 2016

Hi @njbennett are you still having trouble?

@Amit-PivotalLabs
Copy link

Hey @slackpad

Thanks for following up. Yes, we're still having this issue. We guarantee data persistence when scaling up from 3 to 5 whereas we only guarantee the cluster is functional after scaling up from 1 to 3. @ryanmoran and @zankich spoke with @mitchellh and some HashiCorp folks when they visited our office, sounded like this is something we have to live with for now.

@slackpad
Copy link
Contributor

Hi @Amit-PivotalLabs - I was one of those folks :-) In that case I thought it was an issue with a TLS upgrade where you were going from 3 to 1 and then back up, and in particular it was possibly shutting down multiple nodes at the same time. Going from 1 to 3 should not be causing any data loss, so I'd like to track that down if that's still a problem.

@Amit-PivotalLabs
Copy link

Hey @slackpad

Those were two separate issues. We've had users running a 3 node cluster, whom we've migrated to a 3 node TLS cluster, via a scale-down => turn on TLS => scale-up. The scale-down can go south and requires manual intervention (usually blowing away data and starting from scratch, once we're wedged in an "outage" state)

Separately, we have continuous integration for our consul service that tests that it can be scaled up, down, and rolled in various ways. One particular test deploys a 1-node cluster from scratch, then scales up to 3 by adding one node at a time. Given the way we orchestrate this, we can't guarantee data from the 1 node persists on scale-up.

@slackpad
Copy link
Contributor

Given the way we orchestrate this, we can't guarantee data from the 1 node persists on scale-up.

I'd be interested in hearing more about that case - that shouldn't happen.

@slackpad
Copy link
Contributor

slackpad commented May 5, 2017

Closing this out against #2319, which added an automatic check for servers when they've got bootstrap-expect configured. They will reach out to each of the other servers and confirm that they are not part of an existing cluster before bootstrapping. This should prevent split brains due to automatic bootstrapping, and makes it safe to always leave bootstrap-expect configured.

@slackpad slackpad closed this as completed May 5, 2017
@sheshjee
Copy link

sheshjee commented Feb 1, 2024

how to add a node in the existing 3-node patroni-consul cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants