-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Validate and analysis restoration handling in etcd-backup-restore according to the multi-node ETCD proposal #323
Comments
Hi @amshuman-kr, Me and @breuerfelix would like to tackle this issue. If we understand the proposal correctly this is the last part of the series of cases explained in the decision table. |
Thanks @brumhard and @breuerfelix for showing interest and going through the proposal. And also for offering to help with implementing parts of it. Unfortunately, there are some sequencing issues due to dependencies between the sub-tasks of the muli-node/HA story. For example, this task depends on #321 and #322. This is the reason this task hasn't been picked yet. I think @ishan16696 is working on #321 but #322 is probably not taken up yet because of a couple of other topics being picked up before the HA topic (such as backup compaction). The #322 has a bit of overlap with #321 but not too much. Though I expect there will be some merge conflicts if they are done in parallel. @timuthy, @stoyanr, @abdasgupta do you have any thoughts about which issues @brumhard and @breuerfelix can contribute? |
I'd say that contributions for gardener/etcd-druid#221 are very appreciated. It's a topic which doesn't require much coordination because of few or no depending items and so far no one picked up work in this area. |
@amshuman-kr @timuthy ok sounds a bit tough. I just wanted to stress that the HA proposal is pretty high priority for us and if we can help with anything we will put time and effort into it. Tbh gardener/etcd-druid#22 doesn't seem to be the most crucial thing to do for the HA proposal or is it (apart from the Our main goal is to get this feature up and running asap. |
This is definitely true for us as well 👍 Due to dependencies to backup compaction and CP migration (GEP-17) plenty of use-cases were considered so that we can come to a well functioning multi-node etcd feature.
For the reasons mentioned above #221 was suggested. For us, it's an important topic as it'll help to get crucial insights when we eventually roll-out/transition to multi-node. As of today, we plan to ship proper observability together with the multi-node features and don't consider it a nice to have for the future.
+1 here 🙂 |
We have already validated recovery from transient quorum loss ( see here gardener/etcd-druid#436) . For a non quorate cluster, we will need human intervention. The human operator will decide how to recover a non quorate cluster. We will be providing a playbook for their guidance. Please follow gardener/etcd-druid#437 for more details. As the scope of this issue is finished , I am closing this issue. |
Feature (What you would like to be added):
Restoration handling of single node is already implemented and release. Yet we need to evaluate and analysis restoration handling in etcd-backup-restore according to the enhancement to initialisation sequence in the multi-node ETCD proposal.
The enhancements should cover the following cases.
Motivation (Why is this needed?):
Pick individually executable pieces of the multi-node proposal.
Approach/Hint to the implement solution (optional):
The text was updated successfully, but these errors were encountered: