Skip to content

KAFKA-13858; Kraft should not shutdown metadata listener until controller shutdown is finished#12187

Merged
hachikuji merged 5 commits intoapache:trunkfrom
dajac:KAFKA-13858-2
May 25, 2022
Merged

KAFKA-13858; Kraft should not shutdown metadata listener until controller shutdown is finished#12187
hachikuji merged 5 commits intoapache:trunkfrom
dajac:KAFKA-13858-2

Conversation

@dajac
Copy link
Member

@dajac dajac commented May 20, 2022

When the kraft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown finishes and the socket server is stopped. That delay can be as much as 5 seconds and probably even worse.

This PR revises the controlled shutdown procedure as follow:

  • The broker signals to the replica manager that it is about to start the controlled shutdown.
  • The broker requests a controlled shutdown to the controller.
  • The controller moves leaders off from the broker, removes the broker from any ISR that it is a member of, and writes those changes to the metadata log.
  • When the broker receives a partition metadata change, it looks if it is in the ISR. If it is, it updates the partition as usual. If it is not, it stops the fetcher/replica. This basically stops all the partitions for which the broker was part of their ISR.

When the broker is a replica of a partition but it is not in the ISR, the controller does not do anything. The leader epoch is not bumped. In this particular case, the follower will continue to run until the replica manager shuts down. In this time, the replica could become in-sync and the leader could try to bring it back to the ISR. We rely on #12181 to ensure that does not happen.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

} else {
val leader = info.partition.leader
if (newImage.cluster.broker(leader) == null) {
if (isInControlledShutdown && !info.partition.isr.contains(config.brokerId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also count the case when the replica is in the ISR, but no leader is defined. I think this is what we would see if the shutting down replica is the last member of the ISR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I forgot about this case.

@dajac dajac marked this pull request as ready for review May 23, 2022 12:14
if (isInControlledShutdown && !info.partition.isr.contains(config.brokerId)) {
// If we are in controlled shutdown and the replica is not in the ISR,
// we stop the replica.
// We always update the follower state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change here. I do like keeping the state up-to-date even if we cannot start fetching. Hard to know when a dependence on that state will emerge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That seems to be the right thing to do. I have looked at the dependencies on that state and I haven't identified any issues so far.

val partitionsToMakeFollower = new mutable.HashMap[TopicPartition, Partition]
val partitionsToStart = new mutable.HashMap[TopicPartition, Partition]
val partitionsToStop = new mutable.HashMap[TopicPartition, Boolean]
val newFollowerTopicSet = new mutable.HashSet[String]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not from this PR, but I find the naming of newLocalFollowers in the parameter list confusing given the presence of the isNew flag. I guess it's really representing new or updated followers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. How about localFollowers? newOrUpdatedLocalFollowers seems a bit long to me.

partitionsToStart.put(tp, partition)
}
}
changedPartitions.add(partition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're maintaining this collection for the start of the log dir fetchers. Similar to the replica fetchers, I guess we only want to update log dir fetchers when there is an epoch bump (leader or follower). Since we're not yet supporting JBOD anyway in KRaft, it seems tempting to get rid of this logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this but removing unused logic seems appropriate. That will force us to think it through when we implement it. Do you mind if we do this separately? I would like to keep this PR focused on its primary goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do it separately.

stateChangeLogger.info(s"Started fetchers as part of become-follower for ${partitionsToStart.size} partitions")

partitionsToMakeFollower.keySet.foreach(completeDelayedFetchOrProduceRequests)
partitionsToStart.keySet.foreach(completeDelayedFetchOrProduceRequests)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borderline too verbose perhaps, but how about partitionsToStartFetching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable ask.

@dajac dajac requested a review from hachikuji May 24, 2022 16:10
@dajac
Copy link
Member Author

dajac commented May 24, 2022

@hachikuji Thanks for your review. I have updated the PR to address your comments.

Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@hachikuji hachikuji merged commit 76477ff into apache:trunk May 25, 2022
@dajac dajac deleted the KAFKA-13858-2 branch May 30, 2022 08:13
hachikuji pushed a commit that referenced this pull request Oct 13, 2022
…d shutdown (#12741)

When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader.

We missed this case while implementing #12187.

This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately.

Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
hachikuji pushed a commit that referenced this pull request Oct 14, 2022
…d shutdown (#12741)

When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader.

We missed this case while implementing #12187.

This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately.

Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
…d shutdown (apache#12741)

When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader.

We missed this case while implementing apache#12187.

This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately.

Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
@Beverly21024
Copy link

Good post. Someone stole $70,000 from me once (crypto). It took me
1 month with the help of Wizesafetyrecovery @ gmail com and a bunch of stress to finally get it back but it was a learning experience. Ever since then I have had to tighten up my online presence. If anyone reading this has a similar issue, I recommend using them to recover your lost funds.

rutvijmehta-harness pushed a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
…d shutdown (apache#12741)

When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader.

We missed this case while implementing apache#12187.

This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately.

Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
rutvijmehta-harness added a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
…d shutdown (apache#12741) (#59)

When the `BrokerServer` starts its shutting down process, it transitions to `SHUTTING_DOWN` and sets `isShuttingDown` to `true`. With this state change, the follower state changes are short-cutted. This means that a broker which was serving as leader would remain acting as a leader until controlled shutdown completes. Instead, we want the leader and ISR state to be updated so that requests will return NOT_LEADER and the client can find the new leader.

We missed this case while implementing apache#12187.

This patch fixes the issue and updates an existing test to ensure that `isShuttingDown` has not effect. We should consider adding integration tests for this as well. We can do this separately.

Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>

Co-authored-by: David Jacot <djacot@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants