Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downgrade support from 3.5 to 3.4 #15878

Open
logicalhan opened this issue May 11, 2023 · 36 comments · May be fixed by #15994
Open

Downgrade support from 3.5 to 3.4 #15878

logicalhan opened this issue May 11, 2023 · 36 comments · May be fixed by #15994
Assignees

Comments

@logicalhan
Copy link

What would you like to be added?

I would like to be able to safely downgrade from 3.5 to 3.4, and then safely reupgrade back to 3.5.

Why is this needed?

Given the vast number of data correctness issues we've unearthed in etcd 3.5 (many of them fixed by @ahrtr and @serathius), I have personal reservations about upgrading my k8s clusters to use 3.5. If there was a working rollback strategy (tested of course, as well), then I would be much more inclined to update my etcds to a more recent version.

@serathius
Copy link
Member

I think it could be easily added to etcdutl migrate command allowing for safe offline downgrade and upgrade operations.
Code: https://github.com/etcd-io/etcd/blob/main/etcdutl/etcdutl/migrate_command.go

This would also help with kubernetes/kubernetes#117906 and cleanup of kubernetes migrate script for etcd.

@lavacat
Copy link

lavacat commented May 11, 2023

Please assign this to me, we already have a minimal internal patch to address this. In current form - it's a 3.4 patch that allows 3.4 to be deployed within 3.5 cluster to avoid downtime and perform a rolling downgrade.
It's done by hacking version and removing confState and term keys.
But it would be great to make it part of migrate and add more testing around it.

@lavacat lavacat self-assigned this May 11, 2023
@serathius
Copy link
Member

serathius commented May 11, 2023

Just a note, we support for rolling update is out of scope for now. Let's start with the migrate script.

@lavacat
Copy link

lavacat commented May 27, 2023

Quick update - trying to get POC to work. The idea is to run etcdutl migrate --data-dir data-3.5 --target-version 3.4 and get a data dir that etcd 3.4 can be started with.
My understanding is that currently migrate only updates MetaStorageVersionName key that was added since 3.6. But it won't update ClusterClusterVersionKeyName and version in v2store.

At the moment, running into

etcdserver/membership: cluster cannot be downgraded (current version: 3.4.26 is lower than determined cluster version: 3.5).

because of v2store version.

@lavacat
Copy link

lavacat commented May 30, 2023

For reference, tried running etcdctl downgrade from etcd 3.6 build targeting 3.5 cluster, but it didn't work.

Related design docs
etcd Downgrades Design
etcd storage versioning

$ ./bin/etcdctl downgrade validate 3.4
Downgrade validate success, cluster version 3.5.0

$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-05-30T01:12:41.770844-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded

etcd 3.5 one node cluster log

{"level":"info","ts":"2023-05-30T01:12:36.794018-0700","caller":"membership/cluster.go:890","msg":"The server is ready to downgrade","target-version":"3.4.0","server-version":"3.5.9"}
{"level":"warn","ts":"2023-05-30T01:12:36.88595-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}
{"level":"warn","ts":"2023-05-30T01:12:41.77082-0700","caller":"etcdserver/v3_server.go:1047","msg":"reject downgrade request","error":"etcdserver: request timed out"}
{"level":"warn","ts":"2023-05-30T01:12:41.770895-0700","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-05-30T01:12:36.773319-0700","time spent":"4.997554901s","remote":"127.0.0.1:62022","response type":"/etcdserverpb.Maintenance/Downgrade","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"warn","ts":"2023-05-30T01:12:41.886266-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}

Going to debug this more.

@jpbetz
Copy link
Contributor

jpbetz commented Jun 1, 2023

Did we ever de-couple etcd version from data storage version? I vaguely recall multiple people pointing out that it is sort of silly that you can't automatically downgrade from 3.5 to 3.4 given that the file formats of the persisted data is identical, and that if we just gave data files a format version and only incremented it when we actually change how data is written to file that downgrade can be simpler.

@lavacat
Copy link

lavacat commented Jun 1, 2023

Version logic is a bit different between 3.4, 3.5 and 3.6.
In 3.4 version is first decided in decideClusterVersion based on version.Version and then saved to v2store. In Recover we rely only on version recorded in v2store. See clusterVersionFromStore. Version is also saved to backend cluster/clusterVersion but it's never read.

3.5 added clusterVersionFromBackend, but I think v2store path is still used by default. Also 3.5 added downgradeInfoFromBackend. I don't fully understand downgrade, but I think workflow is described here

3.6 is using ClusterVersionFromBackend by default. It also added meta/storageVersion key that's used in migrate.

@lavacat
Copy link

lavacat commented Jun 1, 2023

What do you think about adding a special flag to 3.4 to control version checks? See #15990 This will also allow rolling downgrade.

Another option is to snapshot using etcdctl 3.5, then stop the cluster and restore using etcdctl 3.4. Here are steps I've used to test this:
3.5 cluster

bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

snapshot

./bin/etcdctl snapshot save snap-3.5

stop all nodes, remove infra dirs and restore:

./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra1 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra2 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra3 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'

then start cluster using 3.4 binary:

bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

@lavacat
Copy link

lavacat commented Jun 1, 2023

@serathius saw your comment on PR. Duplicating my question here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using snapshot and it requires stopping all nodes.

I've also tried downgrade enable workflow, that still requires using snapshot but I was hoping there is no need to stop the cluster. It didn't work for me.

@serathius
Copy link
Member

serathius commented Jun 1, 2023

@serathius saw your comment on PR. Duplicating my #15990 (comment) here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using #15878 (comment) and it requires stopping all nodes.

This is exactly what we need to support downgrades. Remove the confState and term fields. This is also exactly what downgrade enable does in v3.6, but it also coordinates the change between members in live cluster. We don't want to backport the coordination logic.

To make it clear, removing confState and term field is crucial for downgrades and etcd correctness. You are right that etcd v3.4 will just start from v3.5 data. However, have you thought about what will happen with confState and term fields? Etcd v3.4 is unaware of those fields so they will remain unchanged and ignored, and then you decide to upgrade back to v3.5 and it goes BOOOM. Etcd v3.5 starts, find those fields, assumes they come from previous v3.5 run and tries to use outdated confState and term. See #13514

One thing we can add in v3.4 is a safeguard for those fields. Have etcd v3.4.27 reject db file if it finds fields from v3.5. It should make it clear to user that just loading data from v3.5 in v3.4 is unsupported and will break their cluster, maybe not immediately, but later.

@lavacat lavacat linked a pull request Jun 1, 2023 that will close this issue
@lavacat
Copy link

lavacat commented Jun 1, 2023

You are right that etcd v3.4 will just start from v3.5 data.

That's actually was my main problem, without restoring from snapshot, v3.4 will fail to start if you just point to 3.5 data dir.

I've added fields to migrate in this PR #15994

@lavacat
Copy link

lavacat commented Jun 2, 2023

@serathius, update PR #15994, I think it's ready for review. But I'd like to clarify couple things.

To make it clear, removing confState and term field is crucial for downgrades and etcd correctness.

v3.4 PR #15990 does this. See downgradeMetaBucket.
Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE.
The problem is that this PR adds "code smell".

Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using snapshot and I had to stop cluster. Am I missing something here? I can retest the procedure again.

@serathius
Copy link
Member

cc @ahrtr @ptabor to get feedback about adding downgrade support.

@serathius
Copy link
Member

Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE.
The problem is that this PR adds "code smell".

Don't understand the statement. What is the code smell you see?

Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using #15878 (comment) and I had to stop cluster. Am I missing something here? I can retest the procedure again.

We should make it work though, can you provide logs so I can understand the problem you are facing?

@ahrtr
Copy link
Member

ahrtr commented Jun 5, 2023

I am not sure whether we should support downgrading 3.5 to 3.4.

Public Cloud

  • EKS seems have already upgraded to 3.5. cc @chaochn47 to double confirm
  • Is AKS still using 3.4? cc @fuweid to double confrim
  • Is GKE still using 3.4? It seemed yes a couple of months back. cc @serathius to double confirm.

Private Cloud

  • Is OpenShift still using 3.4? cc @tjungblu to double confirm
  • TKG isn't using 3.4 anymore. All current TKG versions are using 3.5.

Non-K8s use cases?

Any feedback please?

Online and offline migration

If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration. The offline approach is to backport & enhance the etcdutl migrate command to & in 3.5, as @serathius mentioned in #15878 (comment). But it seems that the etcdutl migrate implementation in main branch doesn't update ClusterClusterVersionKeyName and ClusterDowngradeKeyName when migrating from 3.6 to 3.5?

The high level workflow of online downgrading is,
downgrade_process

@lavacat
Copy link

lavacat commented Jun 5, 2023

@serathius

Don't understand the statement. What is the code smell you see

Adding 3.5.0 capability and downgradeMetaBucket in mvcc seem like a hack. But maybe just my personal perception :)

Here is an example of error when starting 3.4 with 3.5 data-dir

$ bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

{"level":"fatal","ts":"2023-06-05T01:01:52.222568-0700","caller":"membership/cluster.go:795","msg":"invalid downgrade; server version is lower than determined cluster version","current-server-version":"3.4.26","determined-cluster-version":"3.5","stacktrace":"go.etcd.io/etcd/etcdserver/api/membership.mustDetectDowngrade\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:795\ngo.etcd.io/etcd/etcdserver/api/membership.(*RaftCluster).SetVersion\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:570\ngo.etcd.io/etcd/etcdserver.(*applierV2store).Put\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:97\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyV2Request\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:128\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2237\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).apply\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2178\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1412\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1136\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1072\ngo.etcd.io/etcd/pkg/schedule.(*fifo).run\n\t/Users/bk/github/etcd-release-3-5/pkg/schedule/schedule.go:157"}

to remove this error, we need remove mustDetectDowngrade
etcd v3.4 will start, but requests will fail with

$ ./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:07:07.103655-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ca000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable

That's because we are missing 3.5.0 capability

@lavacat
Copy link

lavacat commented Jun 5, 2023

@ahrtr

I am not sure whether we should support downgrading 3.5 to 3.4.

We have 3.4 build with the patch #15990 in case there is a need to do rollback during incident, but we never had to do it.
I think this is useful operationally and makes SREs happy, but if 3.4 is declared EOL, everyone will upgrade without the patch.

In terms of downgrade workflow I've tested using 3 node cluster and there are couple issues:

  1. First call downgrade enable fails, but downgrade job is actually started. I'm using etcdctl downgrade built from main.
$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-06-05T01:20:28.807973-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000196780/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-06-05T01:20:31.260858-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: cluster has a downgrade job in progress"}
Error: etcdserver: cluster has a downgrade job in progress
  1. After replacing 1st member binary, 2 other members fail with
{"level":"info","ts":"2023-06-05T01:21:14.489291-0700","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"fd422379fda50e48","from":"3.5","to":"3.4"}
{"level":"fatal","ts":"2023-06-05T01:21:14.489323-0700","caller":"membership/downgrade.go:59","msg":"invalid downgrade; server version is not allowed to join when downgrade is enabled","current-server-version":"3.5.9","target-cluster-version":"3.4.0","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustDetectDowngrade\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/downgrade.go:59\ngo.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/cluster.go:593\ngo.etcd.io/etcd/server/v3/etcdserver.(*applierV2store).Put\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:101\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyV2Request\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:135\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2228\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2151\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1384\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1199\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1122\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.5.9/schedule/schedule.go:157"}
  1. After starting 2 failed members with 3.4 binary, I still get
./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:28:09.783384-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000240000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable

@serathius
Copy link
Member

serathius commented Jun 5, 2023

Is GKE still using 3.4? It seemed yes a couple of months back. cc @serathius to double confirm.

Yes, GKE is on v3.4. That's why Han is asking for downgrade support so they can feel safe to upgrade.

If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration.

Don't agree. Online downgrade is totally broken in v3.4 and v3.5. The whole design was broken and fixing it would be to disrupt-full to backport. Making sure that downgrades v3.6 -> v3.5 works already will require a lot of qualification, we should not put more resources here.

What I'm proposing is just add support for offline so users avoid totally abandoning users and give them subpar, but working and tested path to rollback. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.

We don't need nothing more then for etcdutl migrate to officially support v3.4

@serathius
Copy link
Member

@lavacat Please follow thread in #11716 (comment) on how broken the etcdctl downgrade enable is on v3.5.

@lavacat
Copy link

lavacat commented Jun 5, 2023

What I'm proposing is just add support for offline so users avoid totally abandoning users and give team subpar but working and tested path to downgrade. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.

I'm onboard with this. migrate with PR #15994 + using snapshot. No changes to 3.4.

@ahrtr ClusterClusterVersionKeyName in 3.4 is updated in SetVersion based on decided cluster version. see comment.
During testing, after snapshot is restored, but before member starts

$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.5.0

after member starts

{"level":"info","ts":"2023-06-05T02:06:06.317983-0700","caller":"membership/cluster.go:547","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"91bc3c398fb3c146","from":"3.0","from":"3.4"}
{"level":"info","ts":"2023-06-05T02:06:06.318064-0700","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}

$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.4.0

ClusterDowngradeKeyName isn't present in 3.4. I can add it to migrate to be removed when 3.5->3.4.

@tjungblu
Copy link
Contributor

tjungblu commented Jun 5, 2023

Is OpenShift still using 3.4? cc @tjungblu to double confirm

Not with any currently supported version. Just to also give you some more data points here, to stay supported customers had to upgrade. So many thousand clusters successfully upgraded from 3.4 to 3.5 already, plus all our e2e test pipelines that were testing this for many ten-thousand runs previously.

I'm not aware of a single issue a customer had. The recommended downgrade procedure IIRC has been to restore the entire control plane with a snapshot from before the upgrade was kicked-off - but I don't think this was ever necessary.

@chaochn47
Copy link
Member

chaochn47 commented Jun 5, 2023

EKS seems have already upgraded to 3.5. cc @chaochn47 to double confirm

Yes. All the supported k8s version etcd clusters have upgraded to use 3.5.

From my understanding, to solve the upgrade failed triggers downgrade issue from k8s perspective.

  1. decouple etcd upgrade and k8s upgrade, so even if k8s upgrade fails, it won't trigger etcd to downgrade. etcd stays at 3.5.
  2. etcd supports downgrade from v3.5 to v3.4 with no downtime.

@fuweid
Copy link
Member

fuweid commented Jun 7, 2023

Hi, @ahrtr. Sorry for late reply.

Is AKS still using 3.4?

Yes. And we are also using other versions depending on the cluster.

For this issue, it seems reasonable to me if we can have rollback solution with no downtime.

@ahrtr
Copy link
Member

ahrtr commented Jul 23, 2023

Thanks all for the feedback.

It seems that 3.4 is only used by minorities. A simple summary...

  • private cloud provider, neither TKG nor OpenShift is using 3.4 anymore. etcd 3.5.6+ has already been verified on (roughly) thousands of clusters in TKG. It has also been verified in OpenShift in lots of cluster as well as @tjungblu mentioned.
  • public cloud provider
    • EKS: In all the supported k8s versions etcd have been upgraded to use 3.5.
    • GKE: Indeed K8s 1.21 (etcd 3.4.x) is still available, but the default version in the Stable channel has already been upgraded to K8s 1.22.12 (should be etcd 3.5.4?) on September 02, 2022. [In theory, K8s 1.22.x is still working on top of etcd 3.4.x]
    • AKS: It's still using etcd 3.4 based on feedback from @fuweid . But based on aks-kubernetes-release-calendar, AKS follows 12 months of support for a generally available (GA) Kubernetes version. So K8s 1.21 (etcd 3.4) should be already out of support.

Also backporting online downgrading from 3.5 to 3.4 also require huge effort, it also might introduce additional risk of regression in 3.5. We should try to avoid adding any new feature to 3.5.

In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl tool to support offline downgrade in case of disaster recovery.

@logicalhan
Copy link
Author

In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl tool to support offline downgrade in case of disaster recovery.

I disagree, GKE does not and has not used 3.5 and they are a major cloud provider. Google's position is that the number of regressions in 3.5 has made upgrade to 3.5 unviable without a safe downgrade path. Therefore, my position is that it should indeed be prioritized.

@serathius
Copy link
Member

serathius commented Jul 24, 2023

I'm on side that this is just too much work and too risky. See the amount of work, all the tasks listed in #13168. Online is just much more complicated then offline supports, as offline can be done by any external binary like etcdutl, but online needs to be built in into etcd binary.

Compare amount of work. For offline downgrading etcd from v3.5 to v3.4, you can just pick the etcdutl for v3.6 without a problem. It's just one PR #15994, still we are working on it for almost a month now. Compare it to online supports that requires backporting multiple months of work.

@jmhbnz
Copy link
Member

jmhbnz commented Jul 26, 2023

My view is that thanks to the uptake of etcd 3.5.6+ in platforms like EKS, OCP and TKG and elsewhere we can draw some confidence from the hundreds of thousands of clusters that have been running successfully for long periods of time now with these versions without issues.

So my preference fwiw is to avoid any pathway involving extensive backports to 3.4 and focus on solid offline downgrade procedure.

@serathius
Copy link
Member

Talked with @logicalhan, I understand his argument that offline downgrade is not viable on large fleet of etcds. It would be a disaster recover level. Fact is that downgrades where implemented broken in v3.5 and it took a big redesign to fix them for v3.6. This however means that we have left a broken API in v3.5. Online downgrades in v3.6 were implemented as bare bones feature, there are still a lot of places the downgrade mechanism needs to be plugged into. Having v3.5->v3.4 online downgrade could help us finish the work.

I would be supportive of fixing online v3.5 -> v3.4 downgrades as:

  • Backports will be to v3.5 and not v3.4.
  • It will fix broken downgrade API in v3.5
  • It will allow us to properly test the downgrade mechanism before we v3.6.
  • It should not take much resources from etcd community as it will be fully funded by @logicalhan.

@ahrtr
Copy link
Member

ahrtr commented Jul 26, 2023

large fleet of etcds

I was thinking etcd 3.4 was only used by minority of K8s clusters for each cloud vendor, including private and public vendors, based on the feedback and my investigation. But it isn't the case for GKE based on the feedback from @logicalhan a couple of days back, the fact is ALL existing K8s versions in GKE are using etcd 3.4.x. I was shocked. It's already 2+ years since the release of 3.5.0, and also 1+ years since the community fixed all known data inconsistency issues.

it will be fully funded by @logicalhan.

I am curious how?

@logicalhan
Copy link
Author

it will be fully funded by @logicalhan.

I am curious how?

We're hiring a person who will work on etcd (at least partially).

@lavacat
Copy link

lavacat commented Jul 26, 2023

Current version of PR works fine with the limitation that one has to use snapshot to downgrade or remove wal files. See #15994 (comment)
This means that downgrade will require cluster downtime and potential data loss of entries in wal that aren't in snapshot yet.

The problem is that version is recorded in WAL and it has to be removed from WAL. We don't have mechanism to do that. Adding this mechanism is possible, but increases complexity of this change.

@serathius @ahrtr
Do you both support adding wal manipulation as part of migrate command?
Is the PR still relevant without online downgrade?

For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet, DowngradeInfoSetRequest, AuthStatusRequest. I don't think this should be merged, but can be a tradeoff if you want to do your 3.4 -> 3.5 upgrade sooner.

@ahrtr
Copy link
Member

ahrtr commented Jul 27, 2023

For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet, DowngradeInfoSetRequest, AuthStatusRequest.

This seems to be the cheapest direction.

Downgrading 3.5 to 3.4 is a special case, we don't have to backport the complete downgrading feature to 3.5. It's risky to do that, and it will also complicate the 3.5 code base.

Proposed change for 3.4 (on top of @lavacat 's #15990)

  • Cleanup new fields added in 3.5. (excluding clusterVersion) on startup and on snapshot recovery, just as [WIP] *: support online downgrade from 3.5 to 3.4 #15990 does.
  • Add dummy support for new protocol added in 3.5 (e.g ClusterVersionSetRequest, ClusterMemberAttrSetRequest, DowngradeInfoSetRequest, AuthStatusRequest). Recognise them but ignore them. Note: NO ANY CHANGE/manipulation ON THE WAL FILES.

EDIT: We don't need to worry about ClusterVersionSetRequest, ClusterMemberAttrSetRequest, and DowngradeInfoSetRequest at all.

  • ClusterVersionSetRequest is only used by updateClusterVersionV3 (in 3.5), which isn't called at all in 3.5.
  • ClusterMemberAttrSetRequest is only used by publishV3, which again isn't called at all in 3.5.
  • DowngradeInfoSetRequest is supported by etcd 3.5, but there is no client side command. Downgrade isn't a completed feature in 3.5. So we don't need to worry about it for 3.5.

So We only need to take care of AuthStatusRequest in 3.4.

More references:

Impact on users (e.g. GKE)

If they want to benefit from this solution. They can't upgrade from old 3.4 to 3.5 directly. Instead, they must upgrade their clusters to a new 3.4.X version (which includes the change proposed above) in the first step, then upgrade to 3.5.x in the second step.

Do we still need #15994?

No, as long as previously the clusters was on a 3.4.x version with the change proposed above.

@lavacat
Copy link

lavacat commented Jul 28, 2023

@ahrtr in principle I agree with your approach. Making changes to 3.4 to support online downgrade seems more practical.

I don't mind throwing away #15994, but it might be cleaner to perform backend migrate instead of dealing with term and confState in 3.4. This way we also use new migrate framework.

Then in 3.4 we can have a flag --experimental-downgrade-3-5 that allows 3.4 to start within 3.5 cluster:

Let's discuss during next community meeting, so everyone is in agreement on next steps. If there is more information/POC needed, let me know, I'll try to compose everything before the meeting.

@ahrtr
Copy link
Member

ahrtr commented Aug 15, 2023

As discussed in previous community meeting, the offline downgrade tool isn't the point. The point is [whether or not] or how to support online downgrade from 3.5 to 3.4.

Usually it's common to make new version (e.g. 3.6) to be backward compatible with old version (e.g. 3.5), and it's exactly the principle what the existing downgrade feature follows. For example, when downgrading from 3.6 to 3.5, the etcd 3.6 instance should migrate the data to be 3.5 compatible.

But the online downgrade is a big & complicated feature, it isn't feasible & safe to backport the complete feature from 3.6 to 3.5.

Instead, we can treat the online downgrade from 3.5 to 3.4 as a special case. I think we can just spend minor or moderate effort to make the old version (3.4) to be forward compatible with the version (3.5). Specifically, we just need to ensure the 3.4 binary can run on the data generated by 3.5 binary, roughly just as I mentioned above #15878 (comment).

@siyuanfoundation
Copy link
Contributor

I have written a design doc regarding the path forward. Please take a look and provide feedbacks, thanks!

cc @ahrtr @lavacat @serathius @logicalhan @fuweid

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Jan 25, 2024

Tracking work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

10 participants