Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 15271 #15492

Merged
merged 3 commits into from
Mar 20, 2023
Merged

Fix 15271 #15492

merged 3 commits into from
Mar 20, 2023

Conversation

serathius
Copy link
Member

@serathius serathius commented Mar 16, 2023

Fixes #15271
cc @ahrtr @ptabor

@serathius
Copy link
Member Author

cc @lavacat

@serathius serathius force-pushed the fix-issue15271 branch 3 times, most recently from b8a710e to 381b4f9 Compare March 16, 2023 22:31
@serathius
Copy link
Member Author

Proposed followup: ensure that watchableStorage tests don't access storage method.

@serathius
Copy link
Member Author

Another possible improvement is to consider using mutative testing to verify existing unit testing for issues like this.

@ahrtr
Copy link
Member

ahrtr commented Mar 17, 2023

Found problem with watchableStore tests. They don't call their own methods, but ones from underlying storage. As result they don't trigger same watch notification code as ones done by apply flow.

The second commit makes no sense to me. Note the watchableStore just adds a watchable layer on top of the underlying store, why do we encapsulate the KV API again? If you see any issue in unit test, shouldn't we fix test, not the production code?

@serathius
Copy link
Member Author

Found problem with watchableStore tests. They don't call their own methods, but ones from underlying storage. As result they don't trigger same watch notification code as ones done by apply flow.

The second commit makes no sense to me. Note the watchableStore just adds a watchable layer on top of the underlying store, why do we encapsulate the KV API again? If you see any issue in unit test, shouldn't we fix test, not the production code?

Problem is that watchableStore exposes two ways to do writes via s.Put(...) and via s.Write(...).Put(...) (symmetrical for delete). Only the second case goes through proper watchableStore code. First case goes directly to storage instance underneath so it doesn't trigger proper watch notification.

Finding the issue required two things:

  • Watching for all events within 1 second, instead of first event. Thus we can find duplication in event.
  • Changing the Put call to go to watchableStore and not store underneath. Reason was that notify function that had the bug, was not called in tests at all because all Put requests went to store.

I would argue that this is not a unit test as it tests multiple layers (watchableStore, store and backend) working together. I consider it a bug that testing of watchableStore called Put method not from watchableStore, but from store.

Aside of the issue with test, I find it confusing and weird that calling watchableStore.Put(...) would not notify watchers.
cc @mitake

@ahrtr
Copy link
Member

ahrtr commented Mar 20, 2023

Note neither watchableStore nor the underlying store expose the DeleteRange & Put method. Instead, only interface WriteView exposes them.

  • Watching for all events within 1 second, instead of first event. Thus we can find duplication in event.

This is a valid point.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavacat
Copy link

lavacat commented Mar 20, 2023

Thanks for adding tests.

I think @ahrtr has a point. When I tried running your updated test without new watchableStore.Put, it passed. When stepping through in debug mode, it goes to writeView.Put and notify is triggered in watchableStoreTxnWrite.End()

@serathius
Copy link
Member Author

Problem is that writeView is a sub-struct of store and store is sub-struct watchableStore. This means that watchableStore can be working even though it was not incorrectly initialized. For correct functioning writeView needs to be reconfigured to point to watchableStore.

This happens in tests where we create watchableStore struct without redirecting writeView to it. This is still very error prone because mvcc package does a lot of usage of private fields/methods. I agree that proposed change in this PR is not optimal, so I will create separate PR to clean up the golang api.

Bogdan Kanivets and others added 3 commits March 20, 2023 11:11
Problem: during restore in watchableStore.Restore, synced watchers are moved to unsynced.
minRev will be behind since it's not updated when watcher stays synced.

Solution: update minRev

fixes: etcd-io#15271
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
@serathius
Copy link
Member Author

PTAL @ptabor @ahrtr

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks

@serathius serathius merged commit 6d01ab6 into etcd-io:main Mar 20, 2023
This was referenced Mar 20, 2023
@serathius serathius deleted the fix-issue15271 branch June 15, 2023 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Watch response traveling back in time when reconnecting member downloads snapshot from the leader
3 participants