Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14144: Compare AlterPartition LeaderAndIsr before fencing partition epoch #12489

Merged
merged 2 commits into from
Aug 9, 2022

Conversation

splett2
Copy link
Contributor

@splett2 splett2 commented Aug 6, 2022

What

This PR fixes an AlterPartition regression introduced in #12032

When an AlterPartition request succeeds, the partition epoch gets bumped. In Zk controller mode the sender also relies on the AlterPartition response to be informed of the new partition epoch.
If the sender times out the request before a response is sent, the sender will have a stale partition epoch compared to the ZK controller state and will be fenced on subsequent AlterPartition request attempts. The sender will not receive an updated partition epoch until it receives a LeaderAndIsr request for controller-initiated ISR changes.

Testing

The existing AlterPartition idempotency test did not catch this regression because the test uses the current partitionEpoch for the AlterPartition request. We update the test to try the request with various partition epochs.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

// epoch), expect it to succeed while the partition epoch remains the same
sendAndVerifyAlterPartitionResponse(oldLeaderAndIsr.partitionEpoch)
sendAndVerifyAlterPartitionResponse(newPartitionEpoch)
sendAndVerifyAlterPartitionResponse(newPartitionEpoch + 1)
Copy link

@hachikuji hachikuji Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do find it a little odd that the partition epoch is ignored completely when the ISR matches the desired state. We do have the leader epoch check, so at least we can ensure that an old leader won't be mislead into thinking that its change was successfully applied. How about a case when the request is sent to an old controller? Suppose a scenario like this:

  1. Controller A has leader=1, isr=[1,2], partition epoch=10
  2. Controller B is elected
  3. Leader sends AlterPartition(epoch=10) to B to remove 2 from ISR => partition epoch = 11
  4. Leader sends AlterPartition(epoch=11) to A to add 2 back to the ISR => A accepts, but there is no bump

I think this case is ruled out because the leader has to find the new controller and then revert back. The controller epoch probably would catch that case. What if we add a restart between steps 3 and 4? Would it be possible to find the old controller after restarting? Probably not, but I think I'd sleep better if we could at least reject requests where the partition epoch is greater than what the controller has in its cache. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fairly reasonable to me.

Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. LGTM.

@hachikuji hachikuji merged commit 62c2880 into apache:trunk Aug 9, 2022
hachikuji pushed a commit that referenced this pull request Aug 9, 2022
…ition epoch (#12489)

This PR fixes an AlterPartition regression introduced in #12032

When an AlterPartition request succeeds, the partition epoch gets bumped. In Zk controller mode the sender also relies on the AlterPartition response to be informed of the new partition epoch.
If the sender times out the request before a response is sent, the sender will have a stale partition epoch compared to the ZK controller state and will be fenced on subsequent AlterPartition request attempts. The sender will not receive an updated partition epoch until it receives a LeaderAndIsr request for controller-initiated ISR changes.

Reviewers: Jason Gustafson <jason@confluent.io>
ijuma added a commit to confluentinc/kafka that referenced this pull request Aug 10, 2022
…(10 August 2022)

Trivial conflict in gradle/dependencies.gradle due to the newer Netty
version in confluentinc/kafka.

* apache-github/trunk:
MINOR: Upgrade gradle to 7.5.1 and bump other build/test dependencies
(apache#12495)
KAFKA-14140: Ensure an offline or in-controlled-shutdown replica is
not eligible to join ISR in ZK mode (apache#12487)
  KAFKA-14114: Add Metadata Error Related Metrics
MINOR: BrokerMetadataSnapshotter must avoid exceeding batch size
(apache#12486)
  MINOR: Upgrade mockito test dependencies (apache#12460)
KAFKA-14144:; Compare AlterPartition LeaderAndIsr before fencing
partition epoch (apache#12489)
KAFKA-14134: Replace EasyMock with Mockito for WorkerConnectorTest
(apache#12472)
  MINOR: Update scala version in bin scripts to 2.13.8 (apache#12477)
KAFKA-14104; Add CRC validation when iterating over Metadata Log
Records (apache#12457)
  MINOR: add :server-common test dependency to :storage (apache#12488)
  KAFKA-14107: Upgrade Jetty version for CVE fixes (apache#12440)
  KAFKA-14124: improve quorum controller fault handling (apache#12447)
ijuma added a commit to franz1981/kafka that referenced this pull request Aug 12, 2022
* apache-github/trunk: (447 commits)
  KAFKA-13959: Controller should unfence Broker with busy metadata log (apache#12274)
  KAFKA-10199: Expose read only task from state updater (apache#12497)
  KAFKA-14154; Return NOT_CONTROLLER from AlterPartition if leader is ahead of controller (apache#12506)
  KAFKA-13986; Brokers should include node.id in fetches to metadata quorum (apache#12498)
  KAFKA-14163; Retry compilation after zinc compile cache error (apache#12507)
  Remove duplicate common.message.* from clients:test jar file (apache#12407)
  KAFKA-13060: Replace EasyMock and PowerMock with Mockito in WorkerGroupMemberTest.java (apache#12484)
  Fix the rate window size calculation for edge cases (apache#12184)
  MINOR: Upgrade gradle to 7.5.1 and bump other build/test dependencies (apache#12495)
  KAFKA-14140: Ensure an offline or in-controlled-shutdown replica is not eligible to join ISR in ZK mode (apache#12487)
  KAFKA-14114: Add Metadata Error Related Metrics
  MINOR: BrokerMetadataSnapshotter must avoid exceeding batch size (apache#12486)
  MINOR: Upgrade mockito test dependencies (apache#12460)
  KAFKA-14144:; Compare AlterPartition LeaderAndIsr before fencing partition epoch (apache#12489)
  KAFKA-14134: Replace EasyMock with Mockito for WorkerConnectorTest (apache#12472)
  MINOR: Update scala version in bin scripts to 2.13.8 (apache#12477)
  KAFKA-14104; Add CRC validation when iterating over Metadata Log Records (apache#12457)
  MINOR: add :server-common test dependency to :storage (apache#12488)
  KAFKA-14107: Upgrade Jetty version for CVE fixes (apache#12440)
  KAFKA-14124: improve quorum controller fault handling (apache#12447)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants