-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling from a single node to three nodes and keeping the data from the one node #1311
Comments
You shouldn't need to do any of that. Your process can be: You are now done. Check out the docs about adding/removing servers for more information.
When you stop the first node, are you sending SIGTERM or SIGINT (Ctrl-c) to it? If it's leaving, due to SIGINT it would be the last node leaving when you stop it the first time, which would of course delete all the data in the data dir since it's leaving the cluster. |
I think that helps. Our deployments are automated, and each node knows what other nodes there are supposed to be, but not how many there were previously (or if a cluster even existed). I think we should be able to have our startup script: If there's any way to make that easier on ourselves, let us know. We're using a SIGTERM to stop the nodes. |
Okay, so a SIGTERM should leave data in the directory since it's a shutdown-without-leave. It'd be worth checking the As far as your startup script goes: It sounds like you're regularly ending up with 0 nodes? The thing is, bootstrap is only ever run if join fails, so if you set join addresses and |
Hi @njbennett are you still having trouble? |
Hey @slackpad Thanks for following up. Yes, we're still having this issue. We guarantee data persistence when scaling up from 3 to 5 whereas we only guarantee the cluster is functional after scaling up from 1 to 3. @ryanmoran and @zankich spoke with @mitchellh and some HashiCorp folks when they visited our office, sounded like this is something we have to live with for now. |
Hi @Amit-PivotalLabs - I was one of those folks :-) In that case I thought it was an issue with a TLS upgrade where you were going from 3 to 1 and then back up, and in particular it was possibly shutting down multiple nodes at the same time. Going from 1 to 3 should not be causing any data loss, so I'd like to track that down if that's still a problem. |
Hey @slackpad Those were two separate issues. We've had users running a 3 node cluster, whom we've migrated to a 3 node TLS cluster, via a scale-down => turn on TLS => scale-up. The scale-down can go south and requires manual intervention (usually blowing away data and starting from scratch, once we're wedged in an "outage" state) Separately, we have continuous integration for our consul service that tests that it can be scaled up, down, and rolled in various ways. One particular test deploys a 1-node cluster from scratch, then scales up to 3 by adding one node at a time. Given the way we orchestrate this, we can't guarantee data from the 1 node persists on scale-up. |
I'd be interested in hearing more about that case - that shouldn't happen. |
Closing this out against #2319, which added an automatic check for servers when they've got bootstrap-expect configured. They will reach out to each of the other servers and confirm that they are not part of an existing cluster before bootstrapping. This should prevent split brains due to automatic bootstrapping, and makes it safe to always leave bootstrap-expect configured. |
how to add a node in the existing 3-node patroni-consul cluster? |
Should this be possible? When we scale from a single server node, to three server nodes, we see the data disappear.
Our process for doing this is as follows:
bootstrap_expect
to 3 and changeretry_join
to the list of server addresses.Once they have all started and synced, we try to read data and are unable to retrieve any data persisted before we scaled the cluster up.
The text was updated successfully, but these errors were encountered: