-
Notifications
You must be signed in to change notification settings - Fork 4.1k
release-23.2: kvserver: remove changed replicas in purgatory from replica set #115037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It was possible for a replica to be stuck processing in a queue's replica set. This could occur when a replica had recently been removed from purgatory for processing but was destroyed, or replica ID changed before being processed. When this occurred, the replica could never be processed by the queue again, potentially leading to decommission stalls, constraint violations or under(over)replication. Remove the replica from the queue set upon encountering a replica which was destroyed, or replica ID changed when processing purgatory. This prevents the replica from becoming stuck in a processing state in the queue set. Fixes: #112761 Fixes: #110761 Release note (bug fix): The store queues will no longer leave purgatory replicas which have changed replica IDs, or have been destroyed stuck unable to process via the respective queue again if re-added.
814dec5 to
8cb6c1e
Compare
1f4ae2c to
c567b30
Compare
|
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
jbowens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not review for correctness but secondary backport policy review lgtm
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
TYFTRs! TestSQLStatsFlush stress race timed out waiting for full replication, doesn't look related. Details
|
Backport 1/1 commits from #114365 on behalf of @kvoli.
/cc @cockroachdb/release
It was possible for a replica to be stuck processing in a queue's
replica set. This could occur when a replica had recently been removed
from purgatory for processing but was destroyed, or replica ID changed
before being processed.
When this occurred, the replica could never be processed by the queue
again, potentially leading to decommission stalls, constraint violations
or under(over)replication.
Remove the replica from the queue set upon encountering a replica which
was destroyed, or replica ID changed when processing purgatory. This
prevents the replica from becoming stuck in a processing state in the
queue set.
Fixes: #112761
Fixes: #110761
Release note (bug fix): The store queues will no longer leave purgatory
replicas which have changed replica IDs, or have been destroyed stuck
unable to process via the respective queue again if re-added.
Release justification: Fixes serious bug.