`KafkaConsumer.subscribe(pattern='x')` sometimes picks up topic but not partitions

There appears to be a race condition bug of some kind in `KafkaConsumer.subscribe(pattern='some pattern')`.

Normally the call works fine, the consumers picks up matching topics, assigns partitions to group members, etc. 

However, once in a blue moon I've observed that the consumer finds the matching topic, but never successfully assigns the topic partitions to the group members. Once it's in this state, it will call `poll()` for hours without returning messages because the consumer thinks it has no assigned partitions, and because the consumer's subscription already contains the topic, there's never a change that triggers a rebalance.

I'm embarrassed to say that I've spent 40+ hours over the past two weeks trying to figure this out as we hit it in production, but all I've managed to do is isolate is a semi-consistently reproducible example. Unfortunately that requires running a service that has a `KafkaConsumer` instance and has a bunch of associated docker containers, so I can't make this setup public. The wrapper service does use `gevent` which I'm not very familiar with, but I disabled all the service's other greenlets so I don't _think_ that should affect this at all.

Every time I try to isolate it down to a simple `kafka-python` script, I cannot reproduce it. But after spending hours stepping through the code, I'm reasonably certain it's a race condition in kafka-python and not the wrapper service. 

Here's what I know:

1. The issue doesn't show up the first time I run the service. If I kill the service (without calling `KafkaConsumer.close()`) and then restart it before the group coordinator evicts the consumer from the group, then I trigger the issue. If I then kill it, wait until I know the group coordinator has evicted all consumers, and then re-run it, it will work fine. Unfortunately, I have no idea if this behavior is related to the root cause, or just a trigger that makes the docker kafka container busy enough that it slows down its response times.
2. In the failure case, calling `KafkaConsumer.subscription()` returns the expected topic name, but calling `KafkaConsumer.assignment()` returns an empty set.
3. In the failure case, I can see that the cluster metadata object has both the topic and the list of partitions, so the cluster metadata is getting correctly updated, it's just not making it into the group assignments.
4. `SubscriptionState.change_subscription()` has a check that short circuits the group rebalance if the previous/current topic subscriptions are equal. If I comment out this [`return`]( https://github.com/dpkp/kafka-python/blob/cec1bdc9965b3d6729d4415e31b4dac04d603873/kafka/consumer/subscription_state.py?utf8=%E2%9C%93#L142) in that short circuit check, the group rebalances properly and the problem disappears.
5. Tracing the TCP calls in Wireshark, I see the following:

    Success case:
      1. Metadata v1 Request
      2. Metadata v2 Response
      3. GroupCoordinator v0 Request
      4. GroupCoordinator v0 Response
      5. JoinGroup v0 Request - protocol member metadata is all 0's
      6. JoinGroup v0 Response - protocol member metadata is all 0's
      7. SyncGroup v0 Request - member assignment is all 0's
      8. SyncGroup v0 Response - member assignment is all 0's
      (note this is a second generation of the group)
      9. JoinGroup v0 Request - protocol member metadata has data
      10. JoinGroup v0 Response - protocol member metadata has data
      11. SyncGroup v0 Request - member assignment has data
      12. SyncGroup v0 Response - member assignment has data
      13. From here on it's the expected behavior of polling the assigned parttions with the occasion Metadata Request/Response when the metadata refresh timeout kicks in

    Failure case:
       1. Metadata v1 Request
       2. Metadata v2 Response
       3. GroupCoordinator v0 Request
       4. GroupCoordinator v0 Response
       5. JoinGroup v0 Request - protocol member metadata is all 0's
       6. JoinGroup v0 Response - protocol member metadata is all 0's
       7. SyncGroup v0 Request - member assignment is all 0's
       (Here is the problem, we never trigger a second JoinGroup v0 Request that contains the partition data)
       8. From here on there are no requests other than the Metadata Request/Response when the metadata refresh timeout kicks in

Setup:
* Single Kafka broker, version 0.10.2.1, running on docker.
* Single instance of the consumer, so it always elects itself as the leader and consumes all partitions for the topic. 
* To keep things simple, my topic has only one partition. However, this race condition might be partition agnostic, meaning that a consumer might be working perfectly and then we expand the number of partitions and it might not pick up that the partitions changed.

After spending a lot of time poking through this code, I understand why the consumer is stuck once this happens, but I don't understand how it gets into this state in the first place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`KafkaConsumer.subscribe(pattern='x')` sometimes picks up topic but not partitions #1237

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

KafkaConsumer.subscribe(pattern='x') sometimes picks up topic but not partitions #1237

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`KafkaConsumer.subscribe(pattern='x')` sometimes picks up topic but not partitions #1237