Prevent coordinator from getting stuck if leadership changes during coordinator run #14385

kfaraz · 2023-06-07T16:41:33Z

Description

If the current leader coordinator is asked to stop being leader, the following happens:

The DruidCoordinator.balancerExec (used for strategy cost computations) is shutdown
The currently running duty finishes execution normally and no more duties are executed
An exception to this is the BalanceSegments duty, which can exit abnormally or even get stuck in the race conditions explained below.

✅ Case 1: `balancerExec.submit()` after `balancerExec.shutdown()`, `BalanceSegments` exits abnormally

Typical sequence of events:

Current coordinator stops being leader and balancerExec is shutdown
CostBalancerStrategy.findNewSegmentHomeBalancer() or any other strategy method is invoked
balancerExec.submit() is invoked with computeCost() tasks
Since the executor has already been shutdown, submission of new tasks throws a RejectedExecutionException and ends the coordinator run as desired

❌ Case 2: `balancerExec.submit()` before `balancerExec.shutdown()`, `BalanceSegments` gets stuck

Typical sequence of events:

BalanceSegments duty is in progress
CostBalancerStrategy.findNewSegmentHomeBalancer() is invoked for some segment
computeCost() tasks for, say 5 servers, are submitted to the executor
Executor picks up 3 of these tasks and starts executing them
Coordinator stops being leader and balancerExec is shutdown
Since the computeCost() tasks do not handle interrupts, the 3 picked up tasks finish execution normally
But the 2 remaining tasks are never picked up by the executor as it is already shutdown
The method findNewSegmentHomeBalancer waits indefinitely for the futures to finish

✅ Case 3: Change in `balancerComputeThreads` dynamic config

A change in this config also results in a shutdown of the balancerExec. But this shutdown is never done concurrently with the coordinator duties and thus doesn't cause the coordinator to get stuck.

Changes

Add a timeout of 1 minute to the resultFuture.get(). 1 minute is the typical time for a full coordinator run and is more than enough time for cost computations of a single segment.
Raise an alert if an exception is encountered while computing costs and if the executor has not been shutdown. This is because a shutdown is intentional and does not require an alert.

kfaraz added 2 commits June 7, 2023 17:55

Get future value with timeout in balancer strategy

6862ff5

Fix timeout in CostBalancerStrategy

a7866ce

kfaraz added Bug Area - Segment Balancing/Coordination labels Jun 7, 2023

kfaraz requested a review from AmatyaAvadhanula June 8, 2023 02:17

AmatyaAvadhanula approved these changes Jun 8, 2023

View reviewed changes

kfaraz merged commit 12e8fa5 into apache:master Jun 8, 2023

kfaraz deleted the fix_balancer_future branch June 8, 2023 09:59

abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023

AmatyaAvadhanula mentioned this pull request Aug 6, 2023

[DRAFT] 27.0.0 release notes #14761

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent coordinator from getting stuck if leadership changes during coordinator run #14385

Prevent coordinator from getting stuck if leadership changes during coordinator run #14385

kfaraz commented Jun 7, 2023 •

edited

Loading

Prevent coordinator from getting stuck if leadership changes during coordinator run #14385

Prevent coordinator from getting stuck if leadership changes during coordinator run #14385

Conversation

kfaraz commented Jun 7, 2023 • edited Loading

Description

✅ Case 1: balancerExec.submit() after balancerExec.shutdown(), BalanceSegments exits abnormally

❌ Case 2: balancerExec.submit() before balancerExec.shutdown(), BalanceSegments gets stuck

✅ Case 3: Change in balancerComputeThreads dynamic config

Changes

kfaraz commented Jun 7, 2023 •

edited

Loading

✅ Case 1: `balancerExec.submit()` after `balancerExec.shutdown()`, `BalanceSegments` exits abnormally

❌ Case 2: `balancerExec.submit()` before `balancerExec.shutdown()`, `BalanceSegments` gets stuck

✅ Case 3: Change in `balancerComputeThreads` dynamic config