MINOR: Change `AlterPartition` validation order in `KafkaController` #12032

hachikuji · 2022-04-11T22:59:56Z

Currently we validate recovery state before checking leader epoch. It seems more intuitive to validate leader epoch first since the leader might be working with stale state. This patch fixes this and adds a couple additional validations.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ontroller`

jsancio

Thanks for the changes @hachikuji . Didn't review the test changes.

jsancio · 2022-04-11T23:13:13Z

core/src/main/scala/kafka/controller/KafkaController.scala

+                partitionResponses(tp) = Left(Errors.FENCED_LEADER_EPOCH)
+                None
+              } else if (newLeaderAndIsr.leaderEpoch > currentLeaderAndIsr.leaderEpoch) {
+                partitionResponses(tp) = Left(Errors.UNKNOWN_LEADER_EPOCH)


This is a new error returned by this handling. Can we

Make sure that ReplicationControlManager also returns this error

We handle this error in kafka.cluster.Partition

Because this error is new and not handle by Partition it will go to the default behavior which is to retry. I think this is okay. Should we change the code for future brokers to not retry? What do you think?

Good questions. Another option I was considering is to call this INVALID_REQUEST since the controller is expected to have the latest leader epoch. I think UNKNOWN_LEADER_EPOCH is typically used in cases where we expect the failure to be transient.

So what I ended up doing is using FENCED_LEADER_EPOCH for any case where the leader epoch does not match the current value. This is consistent with ReplicationControlManager and it is already handled on the broker side in Partition (unlike UKNOWN_LEADER_EPOCH).

…ion-ordering

jsancio

LGTM. Thanks for the changes @hachikuji

core/src/test/scala/unit/kafka/controller/ControllerIntegrationTest.scala

…ition epoch (#12489) This PR fixes an AlterPartition regression introduced in #12032 When an AlterPartition request succeeds, the partition epoch gets bumped. In Zk controller mode the sender also relies on the AlterPartition response to be informed of the new partition epoch. If the sender times out the request before a response is sent, the sender will have a stale partition epoch compared to the ZK controller state and will be fenced on subsequent AlterPartition request attempts. The sender will not receive an updated partition epoch until it receives a LeaderAndIsr request for controller-initiated ISR changes. Reviewers: Jason Gustafson <jason@confluent.io>

…head of controller (#12506) It is possible for the leader to send an `AlterPartition` request to a zombie controller which includes either a partition or leader epoch which is larger than what is found in the controller context. Prior to #12032, the controller handled this in the following way: 1. If the `LeaderAndIsr` state exactly matches the current state on the controller excluding the partition epoch, then the `AlterPartition` request is considered successful and no error is returned. The risk with this handling is that this may cause the leader to incorrectly assume that the state had been successfully updated. Since the controller's state is stale, there is no way to know what the latest ISR state is. 2. Otherwise, the controller will attempt to update the state in zookeeper with the leader/partition epochs from the `AlterPartition` request. This operation would fail if the controller's epoch was not still current in Zookeeper and the result would be a `NOT_CONTROLLER` error. Following #12032, the controller's validation is stricter. If the partition epoch is larger than expected, then the controller will return `INVALID_UPDATE_VERSION` without attempting the operation. Similarly, if the leader epoch is larger than expected, the controller will return `FENCED_LEADER_EPOCH`. The problem with this new handling is that the leader treats the errors from the controller as authoritative. For example, if it sees the `FENCED_LEADER_EPOCH` error, then it will not retry the request and will simply wait until the next leader epoch arrives. The ISR state gets suck in a pending state, which can lead to persistent URPs until the leader epoch gets bumped. In this patch, we want to fix the issues with this handling, but we don't want to restore the buggy idempotent check. The approach is straightforward. If the controller sees a partition/leader epoch which is larger than what it has in the controller context, then it assumes that has become a zombie and returns `NOT_CONTROLLER` to the leader. This will cause the leader to attempt to reset the controller from its local metadata cache and retry the `AlterPartition` request. Reviewers: David Jacot <djacot@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>

…head of controller (#12506) It is possible for the leader to send an `AlterPartition` request to a zombie controller which includes either a partition or leader epoch which is larger than what is found in the controller context. Prior to apache/kafka#12032, the controller handled this in the following way: 1. If the `LeaderAndIsr` state exactly matches the current state on the controller excluding the partition epoch, then the `AlterPartition` request is considered successful and no error is returned. The risk with this handling is that this may cause the leader to incorrectly assume that the state had been successfully updated. Since the controller's state is stale, there is no way to know what the latest ISR state is. 2. Otherwise, the controller will attempt to update the state in zookeeper with the leader/partition epochs from the `AlterPartition` request. This operation would fail if the controller's epoch was not still current in Zookeeper and the result would be a `NOT_CONTROLLER` error. Following apache/kafka#12032, the controller's validation is stricter. If the partition epoch is larger than expected, then the controller will return `INVALID_UPDATE_VERSION` without attempting the operation. Similarly, if the leader epoch is larger than expected, the controller will return `FENCED_LEADER_EPOCH`. The problem with this new handling is that the leader treats the errors from the controller as authoritative. For example, if it sees the `FENCED_LEADER_EPOCH` error, then it will not retry the request and will simply wait until the next leader epoch arrives. The ISR state gets suck in a pending state, which can lead to persistent URPs until the leader epoch gets bumped. In this patch, we want to fix the issues with this handling, but we don't want to restore the buggy idempotent check. The approach is straightforward. If the controller sees a partition/leader epoch which is larger than what it has in the controller context, then it assumes that has become a zombie and returns `NOT_CONTROLLER` to the leader. This will cause the leader to attempt to reset the controller from its local metadata cache and retry the `AlterPartition` request. Reviewers: David Jacot <djacot@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>

MINOR: Change validation AlterPartition validation order in `KafkaC…

f163ea5

…ontroller`

jsancio reviewed Apr 11, 2022

View reviewed changes

hachikuji added 2 commits April 22, 2022 15:43

Merge remote-tracking branch 'upstream/trunk' into minor-alter-partit…

b53112e

…ion-ordering

Revise for zkVersion -> partitionEpoch renaming

a5b34fb

jsancio approved these changes Apr 22, 2022

View reviewed changes

core/src/test/scala/unit/kafka/controller/ControllerIntegrationTest.scala Outdated Show resolved Hide resolved

hachikuji changed the title ~~MINOR: Change validation AlterPartition validation order in KafkaController~~ MINOR: Change AlterPartition validation order in KafkaController Apr 22, 2022

Deconstruct list directly in test case

7029380

hachikuji merged commit 25ee7f1 into apache:trunk Apr 25, 2022

splett2 mentioned this pull request Aug 6, 2022

KAFKA-14144: Compare AlterPartition LeaderAndIsr before fencing partition epoch #12489

Merged

3 tasks

This was referenced Aug 10, 2022

KAFKA-14154; Ensure AlterPartition not sent to stale controller #12499

Closed

KAFKA-14154; Return NOT_CONTROLLER from AlterPartition if leader is ahead of controller #12506

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINOR: Change `AlterPartition` validation order in `KafkaController` #12032

MINOR: Change `AlterPartition` validation order in `KafkaController` #12032

hachikuji commented Apr 11, 2022

jsancio left a comment

jsancio Apr 11, 2022

hachikuji Apr 11, 2022

hachikuji Apr 22, 2022 •

edited

Loading

jsancio left a comment

MINOR: Change AlterPartition validation order in KafkaController #12032

MINOR: Change AlterPartition validation order in KafkaController #12032

Conversation

hachikuji commented Apr 11, 2022

Committer Checklist (excluded from commit message)

jsancio left a comment

Choose a reason for hiding this comment

jsancio Apr 11, 2022

Choose a reason for hiding this comment

hachikuji Apr 11, 2022

Choose a reason for hiding this comment

hachikuji Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

jsancio left a comment

Choose a reason for hiding this comment

MINOR: Change `AlterPartition` validation order in `KafkaController` #12032

MINOR: Change `AlterPartition` validation order in `KafkaController` #12032

hachikuji Apr 22, 2022 •

edited

Loading