KAFKA-13858; Kraft should not shutdown metadata listener until controller shutdown is finished#12187
Conversation
…ller shutdown is finished
| } else { | ||
| val leader = info.partition.leader | ||
| if (newImage.cluster.broker(leader) == null) { | ||
| if (isInControlledShutdown && !info.partition.isr.contains(config.brokerId)) { |
There was a problem hiding this comment.
I wonder if we should also count the case when the replica is in the ISR, but no leader is defined. I think this is what we would see if the shutting down replica is the last member of the ISR.
There was a problem hiding this comment.
That makes sense. I forgot about this case.
| if (isInControlledShutdown && !info.partition.isr.contains(config.brokerId)) { | ||
| // If we are in controlled shutdown and the replica is not in the ISR, | ||
| // we stop the replica. | ||
| // We always update the follower state. |
There was a problem hiding this comment.
Good change here. I do like keeping the state up-to-date even if we cannot start fetching. Hard to know when a dependence on that state will emerge.
There was a problem hiding this comment.
Yeah. That seems to be the right thing to do. I have looked at the dependencies on that state and I haven't identified any issues so far.
| val partitionsToMakeFollower = new mutable.HashMap[TopicPartition, Partition] | ||
| val partitionsToStart = new mutable.HashMap[TopicPartition, Partition] | ||
| val partitionsToStop = new mutable.HashMap[TopicPartition, Boolean] | ||
| val newFollowerTopicSet = new mutable.HashSet[String] |
There was a problem hiding this comment.
Not from this PR, but I find the naming of newLocalFollowers in the parameter list confusing given the presence of the isNew flag. I guess it's really representing new or updated followers.
There was a problem hiding this comment.
That's right. How about localFollowers? newOrUpdatedLocalFollowers seems a bit long to me.
| partitionsToStart.put(tp, partition) | ||
| } | ||
| } | ||
| changedPartitions.add(partition) |
There was a problem hiding this comment.
We're maintaining this collection for the start of the log dir fetchers. Similar to the replica fetchers, I guess we only want to update log dir fetchers when there is an epoch bump (leader or follower). Since we're not yet supporting JBOD anyway in KRaft, it seems tempting to get rid of this logic.
There was a problem hiding this comment.
I don't have a strong opinion on this but removing unused logic seems appropriate. That will force us to think it through when we implement it. Do you mind if we do this separately? I would like to keep this PR focused on its primary goal.
There was a problem hiding this comment.
Yeah, we can do it separately.
| stateChangeLogger.info(s"Started fetchers as part of become-follower for ${partitionsToStart.size} partitions") | ||
|
|
||
| partitionsToMakeFollower.keySet.foreach(completeDelayedFetchOrProduceRequests) | ||
| partitionsToStart.keySet.foreach(completeDelayedFetchOrProduceRequests) |
There was a problem hiding this comment.
Borderline too verbose perhaps, but how about partitionsToStartFetching?
There was a problem hiding this comment.
That seems like a reasonable ask.
|
@hachikuji Thanks for your review. I have updated the PR to address your comments. |
…d shutdown (#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing #12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
…d shutdown (#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing #12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
…d shutdown (apache#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing apache#12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
|
Good post. Someone stole $70,000 from me once (crypto). It took me |
…d shutdown (apache#12741) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing apache#12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
…d shutdown (apache#12741) (#59) When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader. We missed this case while implementing apache#12187. This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io> Co-authored-by: David Jacot <djacot@confluent.io>
When the kraft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown finishes and the socket server is stopped. That delay can be as much as 5 seconds and probably even worse.
This PR revises the controlled shutdown procedure as follow:
When the broker is a replica of a partition but it is not in the ISR, the controller does not do anything. The leader epoch is not bumped. In this particular case, the follower will continue to run until the replica manager shuts down. In this time, the replica could become in-sync and the leader could try to bring it back to the ISR. We rely on #12181 to ensure that does not happen.
Committer Checklist (excluded from commit message)