-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: invoke callback when SetMaxSize drops replicas #153008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2b17d01 to
22e614c
Compare
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @sumeerbhola)
-- commits line 13 at r1:
This isn't a backport, but it is on master. I guess you're saying that this PR is intended to be backported, and we might do something "more complex" on master in addition to this PR?
pkg/kv/kvserver/queue.go line 655 at r1 (raw file):
bq.updateMetricsOnDroppedDueToFullQueue() for _, cb := range item.callbacks { cb.onEnqueueResult(-1 /*indexOnHeap*/, errDroppedDueToFullQueueSize)
in addInternal, we do call it under the lock. It would be good to be consistent if possible, but I'll also take a comment on the callback field that explains that it's sometimes called with and sometimes without the lock and that it's important to be mindful of that.
Previously, when SetMaxSize shrank the queue, replicas were dropped from the priority queue without invoking their callbacks. This commit ensures callbacks are properly invoked when SetMaxSize drops replicas. Replicas removed via removeFromReplicaSetLocked (such as when a replica is destroyed) still don’t always have their callbacks invoked. While the natural place to invoke the callback would be at removeFromReplicaSetLocked, invoking callbacks while holding a lock risks blocking for too long. (We are doing this already for addInternal though.) This PR focuses specifically on handling the SetMaxSize case since this PR is intended to be backported. We can follow up with a more complex but more principled approach on master if needed in the future.
22e614c to
7d9f922
Compare
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @sumeerbhola, and @tbg)
Previously, tbg (Tobias Grieger) wrote…
This isn't a backport, but it is on master. I guess you're saying that this PR is intended to be backported, and we might do something "more complex" on master in addition to this PR?
Clarified the language. I don’t plan to revisit this on master unless needed in the future, but lmk if you think we should.
pkg/kv/kvserver/queue.go line 655 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
in
addInternal, we do call it under the lock. It would be good to be consistent if possible, but I'll also take a comment on the callback field that explains that it's sometimes called with and sometimes without the lock and that it's important to be mindful of that.
Added comments to onEnqueueResult and onProcessResult.
|
TFTR! bors r=tbg |
|
bors didn't pick this up for some reason, asked here https://cockroachlabs.slack.com/archives/CJ0H8Q97C/p1757347843104249. bors retry |
|
bors still seems to be stuck - lets overwhelm it more. bors r+ |
|
Revived : ) |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating backport branch refs/heads/blathers/backport-release-24.3.20-rc-153008: POST https://api.github.com/repos/wenyihu6/cockroach/git/refs: 403 Resource not accessible by integration [] Backport to branch 24.3.20-rc failed. See errors above. error creating backport branch refs/heads/blathers/backport-release-25.2.6-rc-153008: POST https://api.github.com/repos/wenyihu6/cockroach/git/refs: 403 Resource not accessible by integration [] Backport to branch 25.2.6-rc failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Epic: none
Release note: none
kvserver: invoke callback when dropping replicas in bq.SetMaxSize
Previously, when SetMaxSize shrank the queue, replicas were dropped from the
priority queue without invoking their callbacks. This commit ensures callbacks
are properly invoked when SetMaxSize drops replicas. Replicas removed via
removeFromReplicaSetLocked (such as when a replica is destroyed) still don’t
always have their callbacks invoked. While the natural place to invoke the
callback would be at removeFromReplicaSetLocked, invoking callbacks while
holding a lock risks blocking for too long. (We are doing this already for
addInternal though.) This PR focuses specifically on handling the SetMaxSize
case since this PR is intended to be backported. We can follow up with a more
complex but more principled approach on master if needed in the future.