Skip to content

KAFKA-14530: Check state updater more often#13017

Merged
cadonna merged 2 commits intoapache:trunkfrom
lucasbru:check-often
Jan 12, 2023
Merged

KAFKA-14530: Check state updater more often#13017
cadonna merged 2 commits intoapache:trunkfrom
lucasbru:check-often

Conversation

@lucasbru
Copy link
Member

In the new state restoration code, the state updater needs to be checked regularly by the main thread to transfer ownership of tasks back to the main thread once the state of the task is restored. The more often we check this, the faster we can start processing the tasks.

Currently, we only check the state updater once in every loop iteration of the state updater. And while we couldn't observe this to be strictly not often enough, we can increase the number of checks easily by moving the check inside the inner processing loop. This means that once we have iterated over numIterations records, we can already start processing tasks that have finished restoration in the meantime.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@lucasbru
Copy link
Member Author

@cadonna could you please have a look?

Copy link
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucasbru for the PR!

Here my feedback!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this if to where initializeAndRestorePhase() is called? Then we would have both checks about whether the state updater is enabled in the same method not far from each other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done

In the new state restoration code, the state updater needs to be checked regularly
by the main thread to transfer ownership of tasks back to the main thread once the
state of the task is restored. The more often we check this, the faster we can
start processing the tasks.

Currently, we only check the state updater once in every loop iteration of the state
updater. And while we couldn't observe this to be strictly not often enough, we can
increase the number of checks easily by moving the check inside the inner processing
loop. This would mean that once we have iterated over `numIterations` records, we can
already start processing tasks that have finished restoration in the meantime.
Copy link
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @lucasbru!

LGTM!

@lucasbru
Copy link
Member Author

lucasbru commented Jan 2, 2023

Test failures are unrelated.

Build / JDK 17 and Scala 2.13 / testSeparateOffsetsTopic – org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest
1m 40s
Build / JDK 8 and Scala 2.12 / testOutdatedCoordinatorAssignment() – org.apache.kafka.clients.consumer.internals.EagerConsumerCoordinatorTest

@cadonna could you please merge this?

@cadonna cadonna merged commit 22606a0 into apache:trunk Jan 12, 2023
ijuma added a commit to fvaleri/kafka that referenced this pull request Jan 13, 2023
* apache-github/trunk:
  KAFKA-14601: Improve exception handling in KafkaEventQueue apache#13089
  KAFKA-14367; Add `OffsetCommit` to the new `GroupCoordinator` interface (apache#12886)
  KAFKA-14530: Check state updater more often (apache#13017)
  KAFKA-14304 Use boolean for ZK migrating brokers in RPC/record (apache#13103)
  KAFKA-14003 Kafka Streams JUnit4 to JUnit5 migration part 2 (apache#12301)
  KAFKA-14607: Move Scheduler/KafkaScheduler to server-common (apache#13092)
  KAFKA-14367; Add `OffsetFetch` to the new `GroupCoordinator` interface (apache#12870)
  KAFKA-14557; Lock metadata log dir (apache#13058)
  MINOR: Implement toString method for TopicAssignment and PartitionAssignment (apache#13101)
  KAFKA-12558: Do not prematurely mutate internal partition state in Mirror Maker 2 (apache#11818)
  KAFKA-14540: Fix DataOutputStreamWritable#writeByteBuffer (apache#13032)
  KAFKA-14600: Reduce flakiness in ProducerIdExpirationTest (apache#13087)
  KAFKA-14279: Add 3.3.x streams system tests (apache#13077)
  MINOR: bump streams quickstart pom versions and add to list in gradle.properties (apache#13064)
  MINOR: Update KRaft cluster upgrade documentation for 3.4 (apache#13063)
  KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller. (apache#12998)
  KAFKA-14570: Fix parenthesis in verifyFullFetchResponsePartitions output (apache#13072)
  MINOR: Remove public mutable fields from ProducerAppendInfo (apache#13091)
ijuma added a commit to confluentinc/kafka that referenced this pull request Jan 17, 2023
…master

* apache-github/trunk: (23 commits)
  MINOR: Include the inner exception stack trace when re-throwing an exception (apache#12229)
  MINOR: Fix docs to state that sendfile implemented in `TransferableRecords` instead of `MessageSet` (apache#13109)
  Update ProducerConfig.java (apache#13115)
  KAFKA-14618; Fix off by one error in snapshot id (apache#13108)
  KAFKA-13709 (follow-up): Avoid mention of 'exactly-once delivery' or 'delivery guarantees' in Connect (apache#13106)
  KAFKA-14367; Add `TxnOffsetCommit` to the new `GroupCoordinator` interface (apache#12901)
  KAFKA-14568: Move FetchDataInfo and related to storage module (apache#13085)
  KAFKA-14612: Make sure to write a new topics ConfigRecords to metadata log iff the topic is created (apache#13104)
  KAFKA-14601: Improve exception handling in KafkaEventQueue apache#13089
  KAFKA-14367; Add `OffsetCommit` to the new `GroupCoordinator` interface (apache#12886)
  KAFKA-14530: Check state updater more often (apache#13017)
  KAFKA-14304 Use boolean for ZK migrating brokers in RPC/record (apache#13103)
  KAFKA-14003 Kafka Streams JUnit4 to JUnit5 migration part 2 (apache#12301)
  KAFKA-14607: Move Scheduler/KafkaScheduler to server-common (apache#13092)
  KAFKA-14367; Add `OffsetFetch` to the new `GroupCoordinator` interface (apache#12870)
  KAFKA-14557; Lock metadata log dir (apache#13058)
  MINOR: Implement toString method for TopicAssignment and PartitionAssignment (apache#13101)
  KAFKA-12558: Do not prematurely mutate internal partition state in Mirror Maker 2 (apache#11818)
  KAFKA-14540: Fix DataOutputStreamWritable#writeByteBuffer (apache#13032)
  KAFKA-14600: Reduce flakiness in ProducerIdExpirationTest (apache#13087)
  ...
guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
In the new state restoration code, the state updater needs to be checked regularly
by the main thread to transfer ownership of tasks back to the main thread once the
state of the task is restored. The more often we check this, the faster we can
start processing the tasks.

Currently, we only check the state updater once in every loop iteration of the state
updater. And while we couldn't observe this to be strictly not often enough, we can
increase the number of checks easily by moving the check inside the inner processing
loop. This would mean that once we have iterated over `numIterations` records, we can
already start processing tasks that have finished restoration in the meantime.

Reviewer: Bruno Cadonna <cadonna@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments