-
Notifications
You must be signed in to change notification settings - Fork 4.1k
release-22.2: kvserver: remove changed replicas in purgatory from replica set #115035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It was possible for a replica to be stuck processing in a queue's replica set. This could occur when a replica had recently been removed from purgatory for processing but was destroyed, or replica ID changed before being processed. When this occurred, the replica could never be processed by the queue again, potentially leading to decommission stalls, constraint violations or under(over)replication. Remove the replica from the queue set upon encountering a replica which was destroyed, or replica ID changed when processing purgatory. This prevents the replica from becoming stuck in a processing state in the queue set. Fixes: #112761 Fixes: #110761 Release note (bug fix): The store queues will no longer leave purgatory replicas which have changed replica IDs, or have been destroyed stuck unable to process via the respective queue again if re-added.
ecba743 to
a027072
Compare
5801b62 to
79ac785
Compare
|
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, but I'm not sure that we should backport this far back. I'll let @nvanbenschoten comment.
The bug has been around since at least 20.x. It is serious and hard enough to diagnose that it seems worthwhile going back as far as supported. |
|
TYFTRs! |
Backport 1/1 commits from #114365 on behalf of @kvoli.
/cc @cockroachdb/release
It was possible for a replica to be stuck processing in a queue's
replica set. This could occur when a replica had recently been removed
from purgatory for processing but was destroyed, or replica ID changed
before being processed.
When this occurred, the replica could never be processed by the queue
again, potentially leading to decommission stalls, constraint violations
or under(over)replication.
Remove the replica from the queue set upon encountering a replica which
was destroyed, or replica ID changed when processing purgatory. This
prevents the replica from becoming stuck in a processing state in the
queue set.
Fixes: #112761
Fixes: #110761
Release note (bug fix): The store queues will no longer leave purgatory
replicas which have changed replica IDs, or have been destroyed stuck
unable to process via the respective queue again if re-added.
Release justification: Fixes serious bug.