-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer #17700
Draft
kirktrue
wants to merge
36
commits into
apache:trunk
Choose a base branch
from
kirktrue:KAFKA-17182-reduce-fetch-session-eviction
base: trunk
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer #17700
kirktrue
wants to merge
36
commits into
apache:trunk
from
kirktrue:KAFKA-17182-reduce-fetch-session-eviction
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… the new consumer Updated the FetchRequestManager to only create and enqueue fetch requests when signaled to do so by a FetchEvent.
…om prepareFetchRequests()
kirktrue
changed the title
KAFKA-17439: Make polling for new records an explicit action/event in the new consumer
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer
Nov 5, 2024
Fixed typo
…sumer/internals/AbstractFetch.java
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ci-approved
clients
consumer
ctr
Consumer Threading Refactor (KIP-848)
KIP-848
The Next Generation of the Consumer Rebalance Protocol
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change reduces fetch session cache evictions on the broker for
AsyncKafkaConsumer
by altering its logic to determine which partitions it includes in fetch requests.Consumer
implementations fetch data from the cluster and temporarily buffer it in memory until the user next callsConsumer.poll()
. When a fetch request is being generated, partitions that already have buffered data are not included in the fetch request.The
ClassicKafkaConsumer
performs much of its fetch logic and network I/O in the application thread. Onpoll()
, if there is any locally-buffered data, theClassicKafkaConsumer
does not fetch any new data and simply returns the buffered data to the user frompoll()
.On the other hand, the
AsyncKafkaConsumer
consumer splits its logic and network I/O between two threads, which results in a potential race condition during fetch. TheAsyncKafkaConsumer
also checks for buffered data on its application thread. If it finds there is none, it signals the background thread to create a fetch request. However, it's possible for the background thread to receive data from a previous fetch and buffer it before the fetch request logic starts. When that occurs, as the background thread creates a new fetch request, it skips any buffered data, which has the unintended result that those partitions get added to the fetch request's "to remove" set. This signals to the broker to remove those partitions from its internal cache.This issue is technically possible in the
ClassicKafkaConsumer
too, since the heartbeat thread performs network I/O in addition to the application thread. However, because of the frequency at which theAsyncKafkaConsumer
's background thread runs, it is ~100x more likely to happen.The core decision is: what should the background thread do if it is asked to create a fetch request and it discovers there's buffered data. There were multiple proposals to address this issue in the
AsyncKafkaConsumer
. Among them are:Option 3 won out. The change in
AsyncKafkaConsumer
is to include in the fetch request any partition with buffered data. By using a "max bytes" size of 1, this should cause the fetch response to return as little data as possible. In that way, the consumer doesn't buffer too much data on the client before it can be returned frompoll()
.Here are the results of our internal stress testing:
ClassicKafkaConsumer
—after the initial spike during test start up, the average rate settles down to ~0.14 evictions/secondAsyncKafkaConsumer
, (w/o fix)—after startup, the evictions still settle down, but they are about 100x higher than theClassicKafkaConsumer
at ~1.48 evictions/secondAsyncKafkaConsumer
(w/ fix)—the eviction rate is now closer to theClassicKafkaConsumer
at ~0.22 evictions/secondCommitter Checklist (excluded from commit message)