-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the broker round robin selection in task execution configurable #2223
base: main
Are you sure you want to change the base?
Conversation
...se-control/src/main/java/com/linkedin/kafka/cruisecontrol/executor/ExecutionTaskManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change @allenxwang ! Let's get this tested before merging it.
...se-control/src/main/java/com/linkedin/kafka/cruisecontrol/executor/ExecutionTaskPlanner.java
Show resolved
Hide resolved
...se-control/src/main/java/com/linkedin/kafka/cruisecontrol/executor/ExecutionTaskPlanner.java
Show resolved
Hide resolved
@@ -830,6 +831,22 @@ public synchronized void executeProposals(Collection<ExecutionProposal> proposal | |||
requestedIntraBrokerPartitionMovementConcurrency, requestedClusterLeadershipMovementConcurrency, | |||
requestedBrokerLeadershipMovementConcurrency, requestedExecutionProgressCheckIntervalMs, replicaMovementStrategy, | |||
isTriggeredByUserRequest, loadMonitor); | |||
if (removedBrokers != null && !removedBrokers.isEmpty()) { | |||
int count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe extract this snippet as a separate function.
} | ||
} | ||
} | ||
LOG.info("User task {}: {} of partition move proposals are related to removed brokers.", uuid, ((float) count) / totalCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to actually log count and total count here, to observe whether this only happens for large scale rebalance or not.
Summary
We found that the rebalance operation significantly slows down towards the latter half of the execution. It happens when most of the tasks in the proposal involve a single broker. The current behavior is that the ExecutionTaskPlanner would sort the brokers according to the ReplicaMoveStrategy, continuously pick a task from the first broker until it reaches the broker's max concurrency. However, it would skip the broker in the same round if the broker is already involved in a picked task. The effect is that almost all tasks towards the end of the execution are related to the broker that has the most tasks, thus the execution is bottlenecked by the single broker's max concurrency.
We will make this round robin behavior configurable and true by default. When it is set to false, the ExecutionTaskPlanner can keep picking the tasks from same broker that has the highest priority when it is asked to find the next set of tasks to execute. The ExecutionTaskPlanner would compare the broker to pick in the following order:
Expected Behavior
…
Actual Behavior
…
Steps to Reproduce
Known Workarounds
Additional evidence
Categorization
This PR resolves # if any.