-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: add fields to migrate from 3.5 to 3.4 #15994
base: main
Are you sure you want to change the base?
Conversation
10f4667
to
387657e
Compare
The proposed approach looks good, please add e2e upgrade downgrade tests. We need v3.5 -> v3.4 -> v3.5 if we want to officially support this in migrate. Note; current e2e framework supports downloading only the last etcd minor version. You might need to rewrite some e2e code to support downloading multiple old releases. cc @ahrtr @ptabor to get feedback about adding downgrade support. |
server/storage/schema/schema.go
Outdated
addNewField(Meta, MetaTermKeyName, emptyValue), | ||
addNewField(Meta, MetaConfStateName, emptyValue), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add ClusterClusterVersionKeyName
and ClusterDowngradeKeyName
?
etcd/server/storage/schema/bucket.go
Lines 78 to 81 in 8cb3fab
MetaTermKeyName = []byte("term") | |
MetaConfStateName = []byte("confState") | |
ClusterClusterVersionKeyName = []byte("clusterVersion") | |
ClusterDowngradeKeyName = []byte("downgrade") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those fields were added in v3.4 not v3.5
Good point. We need e2e test for v3.6 -> v3.5 -> v3.6 as well. |
I think the current downgrade test already does this. It tests current main vs last release which now is v3.5. I would like to extend the current e2e release tests to test last 2 releases as that's our support window. |
// Since v3.5 | ||
MetaTermKeyName = []byte("term") | ||
MetaConfStateName = []byte("confState") | ||
ScheduledCompactKeyName = []byte("scheduledCompactRev") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create section Since v3.4
to match the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've searched release-3.3 branch and compared to 3.4 and I don't think any new fields were added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
name: "Downgrading v3.5 to v3.4 is supported", | ||
version: version.V3_5, | ||
targetVersion: version.V3_4, | ||
// note: 3.4 doesn't have storageVersion, this field was added in 3.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, could we however also test that DetectSchemaVersion
will properly recognize v3.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this, but had to add another case to UnsafeDetectSchemaVersion
, otherwise this won't work for Upgrading v3.4 to v3.5 should succeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm little worried about implementing etcdutl migrate
from v3.4 -> v3.5.
Looks really promising! left couple of comments. |
@serathius main change since last review is that I had to add better defaults for fields to make sure they can be read on upgrade. |
Looks almost done. Would like to also have a offline downgrade test. Basically start etcd v3.5, do some requests, use etcdutl do downgrade it, do couple of requests, upgrade it using etcdutl, do couple of requests. Do you think you can add it? |
Yes, I was thinking about adding |
89a5dbf
to
8182657
Compare
tests/e2e/utl_migrate_test.go
Outdated
be.Close() | ||
|
||
if tc.delWal { | ||
// clear out v2store to avoid "etcdserver/membership: cluster cannot be downgraded (current version: X is lower than determined cluster version: X)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this is correct. Reason etcdserver/membership: cluster cannot be downgraded
is written is because etcd server will still replay wal entries from since last snapshot. Proper solution would be to ensure that there is no SetClusterVersion
wal entries since last snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick update: working on adding snapshot save/restore for the case when /0/version
entry is in wal log. I think that's the best way to go about it, since that's how users will need to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick update: working on adding snapshot save/restore for the case when /0/version entry is in wal log
Don't understand. You mean to use snapshot save/restore to workaround the issue. I don't think it's correct, the only reason it works is fact that it throws out the WAL completely. Don't think this is correct in 3 node clusters as there is no guarantee that snapshot will save the same data on different nodes.
Proper solution would be to modify/remove the SetClusterVersion WAL entry. Generally it can be risky, as this entry protects etcd from writing incompatible WAL entries that would be interpreted wrongly. There were some new fields introduced into etcd v3.5 WAL. List is available in https://github.com/etcd-io/etcd/blob/main/scripts/etcd_version_annotations.txt.
For v3.6 downgrades I implemented a mechanism to detect WAL version based on proto annotations https://github.com/etcd-io/etcd/blob/main/server/storage/wal/version.go. However maybe we could check what proto versions were between v3.4 and v3.4, maybe the differences are not critical so we could yolo it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serathius you are correct, one has to use same snapshot. I was thinking of something like this for 3 node cluster. There will be cluster downtime if using this approach. There is probably a chance of data loss for entries that are in wal but not applied to backend (maybe there is a way to mitigate this).
Updating WAL is probably ideal. Here is the diff of raft_internal. From the first glance all new requests can be removed during migrate.ClusterVersionSetRequest
, ClusterMemberAttrSetRequest
, DowngradeInfoSetRequest
and AuthStatusRequest
. I still need to check every existing request for possible new non-optional fields.
I'll give it a shot and try to add wal update to migrate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that each member could be on different stage of applying WAL and storing db. We need to make sure that applying those entries works the same on both v3.5/v3.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. #15878 (comment)
ping @lavacat |
7fdfd99
to
adb503f
Compare
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
adb503f
to
859c0ee
Compare
Hey @lavacat - When you have a moment can you please resolve conflicts on this? Marek, Benjamin, Wenjia and I discussed this pr today as still being key for upcoming release. |
related doc Downgrade Support 3.5 to 3.4 |
/retest |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@lavacat: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Tested migrate from 3.5 to 3.4 manually.
fixes: #15878