KAFKA-14296; Partition leaders are not demoted during kraft controlled shutdown#12741
KAFKA-14296; Partition leaders are not demoted during kraft controlled shutdown#12741hachikuji merged 1 commit intoapache:trunkfrom
Conversation
| ): Unit = { | ||
| stateChangeLogger.info(s"Transitioning ${localFollowers.size} partition(s) to " + | ||
| "local followers.") | ||
| val shuttingDown = isShuttingDown.get() |
There was a problem hiding this comment.
This code is a bit brittle, can we have something like the following as the field instead?
private Supplier brokerState = null;
Then we don't have to make sure to update isShuttingDown when the logic changes in KafkaServer.
There was a problem hiding this comment.
I suppose that you are saying that relying on isShuttingDown is brittle, right? If that is the case, I do agree that relying on the broker state is better. I think that we can refactor the remaining usages as a follow-up.
| val state = info.partition.toLeaderAndIsrPartitionState(tp, isNew) | ||
| val isNewLeaderEpoch = partition.makeFollower(state, offsetCheckpoints, Some(info.topicId)) | ||
|
|
||
| if (isInControlledShutdown && (info.partition.leader == NO_LEADER || |
There was a problem hiding this comment.
Similarly, why do we need this special isInControlledShutdown field versus relying on the broker state as I mentioned above? This kind of approach is extremely brittle in my opinion.
There was a problem hiding this comment.
We want to do this only when we are in controlled shutdown and it could be disabled. If you look at the caller side, we aligned this on how we do it for the lifecycleManager as both go together. We could revise this though.
| mockReplicaFetcherManager: Option[ReplicaFetcherManager] = None, | ||
| mockReplicaAlterLogDirsManager: Option[ReplicaAlterLogDirsManager] = None | ||
| mockReplicaAlterLogDirsManager: Option[ReplicaAlterLogDirsManager] = None, | ||
| isShuttingDown: AtomicBoolean = new AtomicBoolean(false) |
There was a problem hiding this comment.
I found the use of the field a bit confusing. We don't necessarily have to fix it here, but what do you think about letting ReplicaManager own its own shutdown? Basically create a private field which serves the same purpose and then expose a method to explicitly toggle it. It is not obvious when looking at BrokerServer that altering the local field has this remote effect.
jsancio
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the changes.
|
Will merge it tomorrow to trunk and 3.3. I am waiting on getting a clean build. |
…d shutdown (#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing #12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
…d shutdown (apache#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing apache#12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
…d shutdown (apache#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing apache#12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
…d shutdown (apache#12741) (#59) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing apache#12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io> Co-authored-by: David Jacot <djacot@confluent.io>
When the
BrokerServerstarts its shutting down process, it transitions toSHUTTING_DOWNand setsisShuttingDowntotrue. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader.We missed this case while implementing #12187.
This patch fixes the issue and updates an existing test to ensure that
isShuttingDownhas not effect. We should consider adding integration tests for this as well. We can do this separately.Committer Checklist (excluded from commit message)