-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node-local core assignment: rebalancing #19864
Conversation
061a14a
to
3a4cc8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one clarifying question
src/v/config/configuration.cc
Outdated
, shard_balancing_on_core_count_change( | ||
*this, | ||
"shard_balancing_on_core_count_change", | ||
"If enabled, and if after a restart the number of cores changes, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If enabled, and if after a restart the number of cores changes, " | |
"If 'true', Redpanda moves partitions between shards if the number of cores changes after a restart, " |
@ztlpn I see that currently in our docs we define "shard" as "core" or "logical CPU core". Is there now a distinction in this case? I'm not sure if they are interchangeable here e.g. "Redpanda moves partitions between shards if the number of shards changes after a restart"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbatuigas They are mostly interchangeable. To be very precise, "shard" is a seastar concept, referring to a compartmentalized part of the Redpanda application that runs as a single thread. Actually you can run Redpanda with fewer or more shards than CPU cores, but the most common and sensible configuration is a 1-to-1 mapping. That's why we say that they are interchangeable.
I've been mostly using the word "core" for user-visible stuff such as configuration property names (because it is a concept that is more familiar to users), but switching to "shard" for precise and/or internal wording. Let me know if it makes sense or if you want to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though for consistency I should probably rename the properties to core_balancing_*
then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think consistency is good. If "shard" is mostly an implementation detail and does not impact how customers should understand or use the property, it makes sense to stick to core_
. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great PR
3a4cc8b
to
7cc114d
Compare
Save core count of last successful rebalance in kvstore, and if it doesn't match the current core count, trigger rebalance on startup.
To maintain balanced shard counts we need to rebalance after partitions are moved away from the node because the distribution of remaining partitions might be unbalanced.
7cc114d
to
928723d
Compare
Introduce rebalancing partitions across cores triggered by the following events:
Backports Required
Release Notes
Features
node_local_core_assignment
flag is enabled, Redpanda will try to maintain balanced distribution of partition replicas across cores.