-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rac2: introduce cluster setting to reset token counters #133202
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)
pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go
line 129 at r1 (raw file):
// such work from unblocking, this setting may be useful. var TokenCounterResetEpoch = settings.RegisterIntSetting( settings.SystemOnly, "kvadmission.flow_controller.token_reset_epoch",
nit: haven't seen more than one arg on a line for a setting declaration before, consider moving the name to a new line.
kvadmission.flow_controller.token_reset_epoch is an escape hatch for cluster operators to reset RACv2 token counters to the full state. The operator should increment this epoch (or change it to a value different than before). This can be used to counteract a token leakage bug, but note that if there is indeed a bug, the leakage may resume, and tokens may again be exhausted. So it is expected that this will be used together with disabling replication admission control by setting kvadmission.flow_control.enabled=false. Note that disabling replication admission control should be sufficient, since it should unblock work that is waiting-for-eval. But in case there is another bug that is preventing such work from unblocking, this setting may be useful. Epic: CRDB-37515 Release note (ops change): The cluster setting kvadmission.flow_controller.token_reset_epoch is an advanced setting that can be used to refill replication admission control v2 tokens. It should only be used after consultation with an expert.
d08ac65
to
d25ceaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli and @pav-kv)
pkg/kv/kvserver/kvflowcontrol/kvflowcontrol.go
line 129 at r1 (raw file):
Previously, kvoli (Austen) wrote…
nit: haven't seen more than one arg on a line for a setting declaration before, consider moving the name to a new line.
Done
bors r=kvoli |
kvadmission.flow_controller.token_reset_epoch is an escape hatch for cluster operators to reset RACv2 token counters to the full state.
The operator should increment this epoch (or change it to a value different than before). This can be used to counteract a token leakage bug, but note that if there is indeed a bug, the leakage may resume, and tokens may again be exhausted. So it is expected that this will be used together with disabling replication admission control by setting kvadmission.flow_control.enabled=false. Note that disabling replication admission control should be sufficient, since it should unblock work that is waiting-for-eval. But in case there is another bug that is preventing such work from unblocking, this setting may be useful.
Epic: CRDB-37515
Release note (ops change): The cluster setting
kvadmission.flow_controller.token_reset_epoch is an advanced setting that can be used to refill replication admission control v2 tokens. It should only be used after consultation with an expert.