-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
troubleshoot excessive snapshot rebalance/recovery rates #14061
Conversation
Files changed:
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything LGTM! Keeping this very general is IMO the right move.
The only specific thing I would want to see before merging this would depend on if we can get engineering to commit to a "maximum" threshold above default. The settings depend heavily on workload, background processes, and resources available, but it would be great to say something like:
"We recommend not increasing these values by more than 4x their default values without explicit approval from Cockroach Labs."
If we can get someone from KV to approve a safe threshold like that, with the caveat that it varies per cluster, it would be nice to have that in the doc and then merge. If we can't get consensus on any sort of threshold, I support merging this as is.
Hi @irfansharif, would you be able to provide any input on @kevinkokomani's question above (recommended max values of |
It's difficult to give a blanket number for those values, the defaults today are somewhat conservative but as knobs, we don't like them very much and would like to remove them altogether in favor of automatic controls with CRDB: cockroachdb/cockroach#80607. Perhaps we can say that these rates can be destabilizing or not at all depending on your environment, and our current recommendations are that you stay within 2x of the default setting (pulled entirely out of thin air). |
Thanks for this context, @irfansharif! I'll use this wording:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mattcrdb)
Fixes DOC-2409.
@kevinkokomani @mattcrdb Note that I've kept this vague in terms of what it means for settings to be "too high". I'd love to include more detailed guidance than this, if it's at all possible, but I realize it's a more complex issue.