Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for v3.5 Ensure that cluster members stored in v2store and backend are in sync #13348

Merged
merged 1 commit into from
Sep 25, 2021

Conversation

serathius
Copy link
Member

@serathius serathius commented Sep 14, 2021

As discussed in #13196 there is a chance that between v3.1 and v3.4 state of bbolt and v2store diverged. This is only noticeable after we upgrade to v3.5 as the authoritative storage was changed from v2store to bbolt.

This PR is a direct fix this problem in v3.5. This is not cherry-pick from master as v3.6 plans to remove v2store totally. Current fix is designed to work assuming that users uses v3.5.1 before upgrading to v3.6. This approach was picked to avoid delaying removal of v2storage in v3.6 (discussed more on original issue).

What this PR does:

  • AddMember/RemoveMember operation will execute on both storeV2 and backend before returning. If they have diverged, and storev2 has member that is not present in backend, re-adding this member will succeed (mirrored for removal). Adding an exiting member will only fail if it's present in both storages(mirrored for removal) . As writing to 2 different storages is not transactional, they might diverge during runtime. This should allow users to repeat the Add/Remove Member to ensure that storages are in sync. This should be enough to fix the problem as member changes are not done often.
  • When etcd bootstraps, RaftCluster Recover will sync contents of backend to match v2store. This operation is always correct and it should be cheap as number of cluster members should be around 3-5, and historically it should not grow beyond tens. This should fix problem of zombie members appearing after upgrade to v3.5. Resigned from implementing this as during Recover it's expected that v2store and backend are out of sync. Reason is that Recover is called during member bootstrap before applying WAL, as storev2 and backend are persisted at different time (storev2 only after snapshot, backend every 5 seconds) .

cc @ptabor

tx.UnsafePut(buckets.MembersRemoved, mkey, []byte("removed"))
if !unsafeMemberExists(tx, mkey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document this logic? So, we "dont'" delete the "mkey" from "Members" bucket, and expect "mkey" does not exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that deletion of mkey was moved to line 84. Its just not needed if its not found.

But +1 to providing doc to this and other methods about errors being reported.

@serathius
Copy link
Member Author

Found one issue when running integration tests. Looks like current implementation breaks TestV3HashRestart test. This test starts etcd, reads DB hash, restarts it, reads the hash again and compare if there was difference. I found that current implementation with v2store overriding db members breaks this assumption.

@serathius
Copy link
Member Author

Decided to drop syncing during Recover call. Updated PR description to provide the reasoning.

@ptabor
Copy link
Contributor

ptabor commented Sep 17, 2021

How about syncing on StoreV2->Backend just before storeV2 snapshot operation ?

@serathius
Copy link
Member Author

By default snapshot happens only 10000 wal entries. I'm worried it might complicate and obfuscate things even more. I'm little scared about using StoreV2-<backend sync. It's a internal change that is not visible in WAL and not tied to any consistent index value. This means it has user visible impact (changes hash of data) without user interaction.

Currently proposed fix just to AddMember/Remove member commands will not resolve corrupted state automatically (will require user to notice and delete/add members themselves). However it gives user full visibility in what is happening. Would love to get some opinion from @gyuho or @hexfusion

@ptabor
Copy link
Contributor

ptabor commented Sep 25, 2021

Thank you. Merging as the discussion is related to a potential next step.

I think we should have either automated 'fix' process or a fix process that requires issuing a single command to bring the stores to sync (independently on number of out-of-sync members).

@ptabor ptabor merged commit 4312298 into etcd-io:release-3.5 Sep 25, 2021
@serathius serathius mentioned this pull request Sep 29, 2021
5 tasks
hasbro17 pushed a commit to hasbro17/etcd that referenced this pull request Feb 2, 2022
Fix for v3.5 Ensure that cluster members stored in v2store and backend are in sync
@serathius serathius deleted the sync branch June 15, 2023 20:39
serathius added a commit to serathius/etcd that referenced this pull request Sep 26, 2023
… on v2 store.

Migration off v2 store was unfinished during v3.5 development etcd-io#12914 leaving RaftCluster broken, half migrated.
This was found in v3.5 and fixed in etcd-io#13348, however this is was not patched on main branch.

This PR makes v3.6 RaftCluster consistent with v3.5 so we can cleanly
migrate off v2 store.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants