Skip to content

KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time#11801

Merged
ableegoldman merged 3 commits intoapache:trunkfrom
ableegoldman:12738-ErrorHandling-send-LeaveGroup-when-thread-dies
Feb 25, 2022
Merged

KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time#11801
ableegoldman merged 3 commits intoapache:trunkfrom
ableegoldman:12738-ErrorHandling-send-LeaveGroup-when-thread-dies

Conversation

@ableegoldman
Copy link
Member

@ableegoldman ableegoldman commented Feb 24, 2022

Quick followup to #11787 to optimize the impact of the task backoff by reducing the time to replace a thread. I noticed it would take around the session.timeout for the new thread to come up and start processing after a thread-fatal error, and realized we weren't sending a LeaveGroup request when a thread hit an exception and died.

Removing the session.timeout.ms override in the ErrorHandlingIntegrationTest.shouldBackOffTaskAndEmitDataWithinSameTopology test causes it to revert to the new default value of this config, which is 45s.

Since the current task backoff is a constant 15s, without this change the integration test would fail when using the 45s session timeout. With this fix, we can now remove the config override and verify that the backoff works with the default configuration

This also speeds up the test greatly, it now takes under .5s whereas previously it was taking 45s+

Copy link
Contributor

@wcarlson5 wcarlson5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for getting to this so quickly!

Copy link
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks, @ableegoldman

@ableegoldman ableegoldman merged commit c2ee141 into apache:trunk Feb 25, 2022
yyu1993 added a commit to confluentinc/kafka that referenced this pull request Feb 25, 2022
* apache-kafka/trunk: (49 commits)
  KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time (apache#11801)
  MINOR: Skip fsync on parent directory to start Kafka on ZOS (apache#11793)
  KAFKA-12738: track processing errors and implement constant-time task backoff (apache#11787)
  MINOR: Cleanup admin creation logic in integration tests (apache#11790)
  KAFKA-10199: Add interface for state updater (apache#11499)
  KAFKA-10000: Utils methods for overriding user-supplied properties and dealing with Enum types (apache#11774)
  KAFKA-10000: Add new metrics for source task transactions (apache#11772)
  KAFKA-13676: Commit successfully processed tasks on error (apache#11791)
  KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT (apache#11575)
  MINOR: Improve Connect docs (apache#11642)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants