-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downgrade support from 3.5 to 3.4 #15878
Comments
I think it could be easily added to This would also help with kubernetes/kubernetes#117906 and cleanup of kubernetes migrate script for etcd. |
Please assign this to me, we already have a minimal internal patch to address this. In current form - it's a 3.4 patch that allows 3.4 to be deployed within 3.5 cluster to avoid downtime and perform a rolling downgrade. |
Just a note, we support for rolling update is out of scope for now. Let's start with the migrate script. |
Quick update - trying to get POC to work. The idea is to run At the moment, running into
because of v2store |
For reference, tried running Related design docs
etcd 3.5 one node cluster log
Going to debug this more. |
Did we ever de-couple etcd version from data storage version? I vaguely recall multiple people pointing out that it is sort of silly that you can't automatically downgrade from 3.5 to 3.4 given that the file formats of the persisted data is identical, and that if we just gave data files a format version and only incremented it when we actually change how data is written to file that downgrade can be simpler. |
Version logic is a bit different between 3.4, 3.5 and 3.6. 3.5 added clusterVersionFromBackend, but I think v2store path is still used by default. Also 3.5 added 3.6 is using |
Another option is to snapshot using etcdctl 3.5, then stop the cluster and restore using etcdctl 3.4. Here are steps I've used to test this:
snapshot
stop all nodes, remove infra dirs and restore:
then start cluster using 3.4 binary:
|
@serathius saw your comment on PR. Duplicating my question here. I've also tried |
This is exactly what we need to support downgrades. Remove the confState and term fields. This is also exactly what To make it clear, removing confState and term field is crucial for downgrades and etcd correctness. You are right that etcd v3.4 will just start from v3.5 data. However, have you thought about what will happen with One thing we can add in v3.4 is a safeguard for those fields. Have etcd v3.4.27 reject db file if it finds fields from v3.5. It should make it clear to user that just loading data from v3.5 in v3.4 is unsupported and will break their cluster, maybe not immediately, but later. |
That's actually was my main problem, without restoring from snapshot, v3.4 will fail to start if you just point to 3.5 data dir. I've added fields to |
@serathius, update PR #15994, I think it's ready for review. But I'd like to clarify couple things.
v3.4 PR #15990 does this. See Assuming we are going with |
Don't understand the statement. What is the code smell you see?
We should make it work though, can you provide logs so I can understand the problem you are facing? |
I am not sure whether we should support downgrading 3.5 to 3.4. Public Cloud
Private Cloud
Non-K8s use cases?Any feedback please? Online and offline migrationIf we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration. The offline approach is to backport & enhance the |
Adding Here is an example of error when starting 3.4 with 3.5 data-dir
to remove this error, we need remove mustDetectDowngrade
That's because we are missing |
We have 3.4 build with the patch #15990 in case there is a need to do rollback during incident, but we never had to do it. In terms of downgrade workflow I've tested using 3 node cluster and there are couple issues:
|
Yes, GKE is on v3.4. That's why Han is asking for downgrade support so they can feel safe to upgrade.
Don't agree. Online downgrade is totally broken in v3.4 and v3.5. The whole design was broken and fixing it would be to disrupt-full to backport. Making sure that downgrades v3.6 -> v3.5 works already will require a lot of qualification, we should not put more resources here. What I'm proposing is just add support for offline so users avoid totally abandoning users and give them subpar, but working and tested path to rollback. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5. We don't need nothing more then for |
@lavacat Please follow thread in #11716 (comment) on how broken the |
I'm onboard with this. @ahrtr
after member starts
|
Not with any currently supported version. Just to also give you some more data points here, to stay supported customers had to upgrade. So many thousand clusters successfully upgraded from 3.4 to 3.5 already, plus all our e2e test pipelines that were testing this for many ten-thousand runs previously. I'm not aware of a single issue a customer had. The recommended downgrade procedure IIRC has been to restore the entire control plane with a snapshot from before the upgrade was kicked-off - but I don't think this was ever necessary. |
Yes. All the supported k8s version etcd clusters have upgraded to use 3.5. From my understanding, to solve the upgrade failed triggers downgrade issue from k8s perspective.
|
Hi, @ahrtr. Sorry for late reply.
Yes. And we are also using other versions depending on the cluster. For this issue, it seems reasonable to me if we can have rollback solution with no downtime. |
Thanks all for the feedback. It seems that 3.4 is only used by minorities. A simple summary...
Also backporting online downgrading from 3.5 to 3.4 also require huge effort, it also might introduce additional risk of regression in 3.5. We should try to avoid adding any new feature to 3.5. In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the |
I disagree, GKE does not and has not used 3.5 and they are a major cloud provider. Google's position is that the number of regressions in 3.5 has made upgrade to 3.5 unviable without a safe downgrade path. Therefore, my position is that it should indeed be prioritized. |
I'm on side that this is just too much work and too risky. See the amount of work, all the tasks listed in #13168. Online is just much more complicated then offline supports, as offline can be done by any external binary like Compare amount of work. For offline downgrading etcd from v3.5 to v3.4, you can just pick the |
My view is that thanks to the uptake of etcd 3.5.6+ in platforms like EKS, OCP and TKG and elsewhere we can draw some confidence from the hundreds of thousands of clusters that have been running successfully for long periods of time now with these versions without issues. So my preference fwiw is to avoid any pathway involving extensive backports to 3.4 and focus on solid offline downgrade procedure. |
Talked with @logicalhan, I understand his argument that offline downgrade is not viable on large fleet of etcds. It would be a disaster recover level. Fact is that downgrades where implemented broken in v3.5 and it took a big redesign to fix them for v3.6. This however means that we have left a broken API in v3.5. Online downgrades in v3.6 were implemented as bare bones feature, there are still a lot of places the downgrade mechanism needs to be plugged into. Having v3.5->v3.4 online downgrade could help us finish the work. I would be supportive of fixing online v3.5 -> v3.4 downgrades as:
|
I was thinking etcd 3.4 was only used by minority of K8s clusters for each cloud vendor, including private and public vendors, based on the feedback and my investigation. But it isn't the case for GKE based on the feedback from @logicalhan a couple of days back, the fact is
I am curious how? |
We're hiring a person who will work on etcd (at least partially). |
Current version of PR works fine with the limitation that one has to use snapshot to downgrade or remove wal files. See #15994 (comment) The problem is that version is recorded in WAL and it has to be removed from WAL. We don't have mechanism to do that. Adding this mechanism is possible, but increases complexity of this change. @serathius @ahrtr For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain |
This seems to be the cheapest direction. Downgrading 3.5 to 3.4 is a special case, we don't have to backport the complete downgrading feature to 3.5. It's risky to do that, and it will also complicate the 3.5 code base. Proposed change for 3.4 (on top of @lavacat 's #15990)
EDIT: We don't need to worry about
So We only need to take care of More references:
Impact on users (e.g. GKE)If they want to benefit from this solution. They can't upgrade from old 3.4 to 3.5 directly. Instead, they must upgrade their clusters to a new 3.4.X version (which includes the change proposed above) in the first step, then upgrade to 3.5.x in the second step. Do we still need #15994?No, as long as previously the clusters was on a 3.4.x version with the change proposed above. |
@ahrtr in principle I agree with your approach. Making changes to 3.4 to support online downgrade seems more practical. I don't mind throwing away #15994, but it might be cleaner to perform backend Then in 3.4 we can have a flag
Let's discuss during next community meeting, so everyone is in agreement on next steps. If there is more information/POC needed, let me know, I'll try to compose everything before the meeting. |
As discussed in previous community meeting, the offline downgrade tool isn't the point. The point is [whether or not] or how to support online downgrade from 3.5 to 3.4. Usually it's common to make new version (e.g. 3.6) to be backward compatible with old version (e.g. 3.5), and it's exactly the principle what the existing downgrade feature follows. For example, when downgrading from 3.6 to 3.5, the etcd 3.6 instance should migrate the data to be 3.5 compatible. But the online downgrade is a big & complicated feature, it isn't feasible & safe to backport the complete feature from 3.6 to 3.5. Instead, we can treat the online downgrade from 3.5 to 3.4 as a special case. I think we can just spend minor or moderate effort to make the old version (3.4) to be forward compatible with the version (3.5). Specifically, we just need to ensure the 3.4 binary can run on the data generated by 3.5 binary, roughly just as I mentioned above #15878 (comment). |
I have written a design doc regarding the path forward. Please take a look and provide feedbacks, thanks! |
Tracking work
|
What would you like to be added?
I would like to be able to safely downgrade from 3.5 to 3.4, and then safely reupgrade back to 3.5.
Why is this needed?
Given the vast number of data correctness issues we've unearthed in etcd 3.5 (many of them fixed by @ahrtr and @serathius), I have personal reservations about upgrading my k8s clusters to use 3.5. If there was a working rollback strategy (tested of course, as well), then I would be much more inclined to update my etcds to a more recent version.
The text was updated successfully, but these errors were encountered: