-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KafkaConsumer.subscribe(pattern='x')
sometimes picks up topic but not partitions
#1237
Comments
I did see something similar in Tracked down the problem parts of Coordinator I was dealing with that time, seems like problem KAFKA-3949. |
Thanks @tvoinarovskyi, I think you're on to something that KAFKA-3949 matches what I'm seeing. I'm still double-checking, which is slow because I'm a little hazy on which log lines are executed in which thread context, and because it only consistently appears within this test case that runs under gevent, I can't just print the thread id. Here's logs--note that I added additional logging of the various flags to see when state was changing: Success case when I first run the script:
Failure case which happens when I immediately re-run the script:
|
If the group leader somehow gets in a state that it has an empty partition assignment, then `self._assignment_snapshot` will be `{}` which evaluates to `False`. So `self._subscription.mark_for_reassignment()` will never be triggered, even if `self._assignment_snapshot != self._metadata_snapshot`. Fixes the symptoms of #1237 although I suspect there's an additional bug in that case that triggers the condition of the the group leader getting an empty partition assignment.
After implementing the fix in #1240, I cannot reproduce this issue. This fix does not incorporate the changes from KAFKA-3949 at all, so I'm a bit confused on whether this fixes the symptom of the KAFKA-3949 race condition or a different issue altogether. If the change in #1240 is all that is needed to (effectively) fix KAFKA-3949, why did the Java crew do a much more extensive refactor? I suspect I'm just overlooking something obvious here... |
If the group leader somehow gets in a state that it has an empty partition assignment, then `self._assignment_snapshot` will be `{}` which evaluates to `False`. So `self._subscription.mark_for_reassignment()` will never be triggered, even if `self._assignment_snapshot != self._metadata_snapshot`. Fixes the symptoms of #1237 although I suspect there's an additional bug in that case that triggers the condition of the the group leader getting an empty partition assignment.
If the group leader somehow gets in a state that it has an empty partition assignment, then `self._assignment_snapshot` will be `{}` which evaluates to `False`. So `self._subscription.mark_for_reassignment()` will never be triggered, even if `self._assignment_snapshot != self._metadata_snapshot`. Fixes the symptoms of dpkp#1237 although I suspect there's an additional bug in that case that triggers the condition of the the group leader getting an empty partition assignment.
There appears to be a race condition bug of some kind in
KafkaConsumer.subscribe(pattern='some pattern')
.Normally the call works fine, the consumers picks up matching topics, assigns partitions to group members, etc.
However, once in a blue moon I've observed that the consumer finds the matching topic, but never successfully assigns the topic partitions to the group members. Once it's in this state, it will call
poll()
for hours without returning messages because the consumer thinks it has no assigned partitions, and because the consumer's subscription already contains the topic, there's never a change that triggers a rebalance.I'm embarrassed to say that I've spent 40+ hours over the past two weeks trying to figure this out as we hit it in production, but all I've managed to do is isolate is a semi-consistently reproducible example. Unfortunately that requires running a service that has a
KafkaConsumer
instance and has a bunch of associated docker containers, so I can't make this setup public. The wrapper service does usegevent
which I'm not very familiar with, but I disabled all the service's other greenlets so I don't think that should affect this at all.Every time I try to isolate it down to a simple
kafka-python
script, I cannot reproduce it. But after spending hours stepping through the code, I'm reasonably certain it's a race condition in kafka-python and not the wrapper service.Here's what I know:
The issue doesn't show up the first time I run the service. If I kill the service (without calling
KafkaConsumer.close()
) and then restart it before the group coordinator evicts the consumer from the group, then I trigger the issue. If I then kill it, wait until I know the group coordinator has evicted all consumers, and then re-run it, it will work fine. Unfortunately, I have no idea if this behavior is related to the root cause, or just a trigger that makes the docker kafka container busy enough that it slows down its response times.In the failure case, calling
KafkaConsumer.subscription()
returns the expected topic name, but callingKafkaConsumer.assignment()
returns an empty set.In the failure case, I can see that the cluster metadata object has both the topic and the list of partitions, so the cluster metadata is getting correctly updated, it's just not making it into the group assignments.
SubscriptionState.change_subscription()
has a check that short circuits the group rebalance if the previous/current topic subscriptions are equal. If I comment out thisreturn
in that short circuit check, the group rebalances properly and the problem disappears.Tracing the TCP calls in Wireshark, I see the following:
Success case:
(note this is a second generation of the group)
Failure case:
1. Metadata v1 Request
2. Metadata v2 Response
3. GroupCoordinator v0 Request
4. GroupCoordinator v0 Response
5. JoinGroup v0 Request - protocol member metadata is all 0's
6. JoinGroup v0 Response - protocol member metadata is all 0's
7. SyncGroup v0 Request - member assignment is all 0's
(Here is the problem, we never trigger a second JoinGroup v0 Request that contains the partition data)
8. From here on there are no requests other than the Metadata Request/Response when the metadata refresh timeout kicks in
Setup:
After spending a lot of time poking through this code, I understand why the consumer is stuck once this happens, but I don't understand how it gets into this state in the first place.
The text was updated successfully, but these errors were encountered: