Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

troubleshoot excessive snapshot rebalance/recovery rates #14061

Merged
merged 2 commits into from
Jun 22, 2022

Conversation

taroface
Copy link
Contributor

@taroface taroface commented Jun 7, 2022

Fixes DOC-2409.

@kevinkokomani @mattcrdb Note that I've kept this vague in terms of what it means for settings to be "too high". I'd love to include more detailed guidance than this, if it's at all possible, but I realize it's a more complex issue.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@netlify
Copy link

netlify bot commented Jun 7, 2022

Netlify Preview

Name Link
🔨 Latest commit 7550944
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/62aa5376815e10000801994e
😎 Deploy Preview https://deploy-preview-14061--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link

@kevinkokomani kevinkokomani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything LGTM! Keeping this very general is IMO the right move.

The only specific thing I would want to see before merging this would depend on if we can get engineering to commit to a "maximum" threshold above default. The settings depend heavily on workload, background processes, and resources available, but it would be great to say something like:

"We recommend not increasing these values by more than 4x their default values without explicit approval from Cockroach Labs."

If we can get someone from KV to approve a safe threshold like that, with the caveat that it varies per cluster, it would be nice to have that in the doc and then merge. If we can't get consensus on any sort of threshold, I support merging this as is.

@taroface
Copy link
Contributor Author

Hi @irfansharif, would you be able to provide any input on @kevinkokomani's question above (recommended max values of kv.snapshot_rebalance.max_rate and kv.snapshot_recovery.max_rate)?

@irfansharif
Copy link
Contributor

It's difficult to give a blanket number for those values, the defaults today are somewhat conservative but as knobs, we don't like them very much and would like to remove them altogether in favor of automatic controls with CRDB: cockroachdb/cockroach#80607. Perhaps we can say that these rates can be destabilizing or not at all depending on your environment, and our current recommendations are that you stay within 2x of the default setting (pulled entirely out of thin air).

@taroface
Copy link
Contributor Author

taroface commented Jun 15, 2022

Thanks for this context, @irfansharif! I'll use this wording:

However, if the settings are too high when nodes are added to the cluster, this can cause degraded performance and node crashes. We recommend **not** increasing these values by more than 2 times their [default values](cluster-settings.html) without explicit approval from Cockroach Labs.

Copy link
Contributor

@MichaelTrestman MichaelTrestman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mattcrdb)

@taroface taroface merged commit b5f365f into master Jun 22, 2022
@taroface taroface deleted the kb-rebalance-recovery branch June 22, 2022 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants