Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvcc: avoid negative watcher count metrics #11882

Merged
merged 1 commit into from
May 12, 2020

Conversation

jackkleeman
Copy link
Contributor

The watch count metrics are not robust to duplicate cancellations. These
cause the count to be decremented twice, leading eventually to negative
counts. We are seeing this in production. The duplicate cancellations
themselves are not themselves a big problem (except performance), but
they are caused by the new proactive cancellation logic (#11850) which
cancels proactively even immediately before initiating a Close, thus
nearly guaranteeing a Close-cancel race, as discussed in
watchable_store.go. We can avoid this in most cases by not sending a
cancellation when we are going to Close.

The client change is simply for the benefit of those using the 3.5 client against an old cluster - we could remove it if desired. The metrics fix on the server side is sufficient, and that's the real bug here arguably.

The watch count metrics are not robust to duplicate cancellations. These
cause the count to be decremented twice, leading eventually to negative
counts. We are seeing this in production. The duplicate cancellations
themselves are not themselves a big problem (except performance), but
they are caused by the new proactive cancellation logic (etcd-io#11850) which
cancels proactively even immediately before initiating a Close, thus
nearly guaranteeing a Close-cancel race, as discussed in
watchable_store.go. We can avoid this in most cases by not sending a
cancellation when we are going to Close.
@jackkleeman jackkleeman force-pushed the dup-cancel-metrics branch from 08b078f to c0a2e09 Compare May 11, 2020 16:47
@jackkleeman
Copy link
Contributor Author

cc @gyuho

@xiang90
Copy link
Contributor

xiang90 commented May 12, 2020

lgtm

@xiang90 xiang90 merged commit ec13797 into etcd-io:master May 12, 2020
@jackkleeman jackkleeman deleted the dup-cancel-metrics branch May 13, 2020 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants