KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time#11801
Merged
ableegoldman merged 3 commits intoapache:trunkfrom Feb 25, 2022
Conversation
wcarlson5
approved these changes
Feb 24, 2022
Contributor
wcarlson5
left a comment
There was a problem hiding this comment.
LGTM, thanks for getting to this so quickly!
vvcephei
approved these changes
Feb 24, 2022
Contributor
vvcephei
left a comment
There was a problem hiding this comment.
Very nice! Thanks, @ableegoldman
Member
Author
yyu1993
added a commit
to confluentinc/kafka
that referenced
this pull request
Feb 25, 2022
* apache-kafka/trunk: (49 commits) KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time (apache#11801) MINOR: Skip fsync on parent directory to start Kafka on ZOS (apache#11793) KAFKA-12738: track processing errors and implement constant-time task backoff (apache#11787) MINOR: Cleanup admin creation logic in integration tests (apache#11790) KAFKA-10199: Add interface for state updater (apache#11499) KAFKA-10000: Utils methods for overriding user-supplied properties and dealing with Enum types (apache#11774) KAFKA-10000: Add new metrics for source task transactions (apache#11772) KAFKA-13676: Commit successfully processed tasks on error (apache#11791) KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT (apache#11575) MINOR: Improve Connect docs (apache#11642) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Quick followup to #11787 to optimize the impact of the task backoff by reducing the time to replace a thread. I noticed it would take around the
session.timeoutfor the new thread to come up and start processing after a thread-fatal error, and realized we weren't sending a LeaveGroup request when a thread hit an exception and died.Removing the
session.timeout.msoverride in the ErrorHandlingIntegrationTest.shouldBackOffTaskAndEmitDataWithinSameTopology test causes it to revert to the new default value of this config, which is 45s.Since the current task backoff is a constant 15s, without this change the integration test would fail when using the 45s session timeout. With this fix, we can now remove the config override and verify that the backoff works with the default configuration
This also speeds up the test greatly, it now takes under .5s whereas previously it was taking 45s+