-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node-local shard assignments for created/moved partitions #18581
Conversation
/ci-repeat |
a995272
to
9f26058
Compare
/ci-repeat |
e01b9e1
to
b9b3207
Compare
/ci-repeat |
1 similar comment
/ci-repeat |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49586#018fbafb-008c-4b8e-bad4-2cd62a2543a4 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49586#018fbafb-4ce5-499a-9005-c880a83417fd ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49658#018fc6ce-5b06-4e0c-8b23-9e4b93e34933 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49658#018fc6e7-a4f8-4ec7-bdb1-110c52ad569c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/50021#018ff40a-bea8-4a68-a798-8fabe1518c86 |
608f5c9
to
5ba91dc
Compare
/ci-repeat |
5ba91dc
to
78ac5b8
Compare
/ci-repeat |
78ac5b8
to
71e7417
Compare
/ci-repeat |
0000f63
to
8e25a46
Compare
/ci-repeat |
We are going to use this feature flag to mark transition to node-local core assignment, so rename it. It will also require migration (during which we will wait for all nodes to persist their shard placement tables), so mark it as requires_migration.
8e25a46
to
c645225
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for doc!
this is great, mostly nit comments |
It is inconvenient to override start/stop methods because they have to do some common bookkeeping (such as setting the abort source). Instead, give derived classes a possibility to override do_migrate() method if the migration is more complicated than executing some action on the controller leader (for these simple cases they can override do_mutate).
When enabling node-local shard placement, we need 2 stages: first, we need to persist current state of the shard_placement_table to kvstore, and then, after *all* nodes finish this, shard values in topic_table become obsolete. Implement this 2-stage process using feature migrator and a feature barrier.
For the purpose of balancing assignments we need to maintain overall and per-topic partition shard counts.
Assign shards based on local information, not on assignments from topic_table.
The method allows changing a partition replica shard assignment from any cluster node (not just the node hosting that replica).
No functional changes in this commit.
No functional changes in this commit.
If node-local core assignment is active, we stop doing per-shard accounting and assign invalid/zero shard when allocating new partition replicas.
If node-local core assignment is enabled, we can ignore replica core values passed by the client.
c645225
to
4ba1aed
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/50000#018ff29e-4606-40c5-9480-b17765609ed1:
new failures in https://buildkite.com/redpanda/redpanda/builds/50000#018ff29e-4608-49f3-adc2-acceb1e8db8a:
|
Add a test checking basic functionality of node-local shard placement after upgrade.
In tests we need to support both the old way to dispatch x-core movements (via admin.set_partition_replicas) and the new way (via admin.set_partition_replica_core) depending on whether the cluster is new or in the process of upgrade. To allow that, change functions in PartitionMovementMixin: 1) allow omitting "core" field in replica assignment dicts (absent field means that the core doesn't matter) 2) add node_local_core_assignment flag to switch between the old and the new way. Also, modify available_policy for the feature flag to ensure that for most tests node-local core assignment will be enabled.
4ba1aed
to
b305bfe
Compare
Test failures triage:
|
This PR implements node-local core assignment - nodes decide themselves on which core to put partitions instead of using global assignments. Partitions are placed so that topic-aware core counts are balanced. No online balancing is preformed yet (i.e. cores only for newly appearing partitions are chosen).
More detailed changes rundown:
shard_balancer
. For this it needs to maintain per-topic core counts.partition_allocator
to stop tracking and assigning coresBackports Required
Release Notes
Improvements