-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what's the problem with consumer group #199
Comments
Consumer groups are more about offset management than about preventing double consumption between consumers. There is a concept called coordinated consumer groups, but that is not available in non-JVM clients. The way that I handle this is to spin up a python consumer per partition instead of having every consumer read every partition.
|
@morndust I started using kafka-python just recently as well and found out that this is a feature/limitation as per #173. I ended up using kafka-python for my producers and https://github.com/bpot/poseidon, a kafka ruby client, to accomplish what you are asking in the consumer side. Basically my use case is that I want to scale horizontally the consumption of messages in a consumer topic by spinning more consumer processes in different VMs, not just the current one, which is what MultipleProcessConsumer accomplishes. Kafka-python is not supporting this because once you turn on a new process then it starts from the beginning instead from the real offset, and the same messages are consumed by all the consumers within that topic. Regardless, kafka-python really works well for everything else. Thanks to everyone involved in this project, it is really a relief not to have to deal with the JVM. |
To be clear: Kafka-Python supports offset management and resumption. It does not support having C consumers and P partitions and automatically distributing load without duplicate readers for a message. If you need help getting resuming from an offset working, we'd be glad to help you out. |
@morndust the group param in SimpleConsumer (my-foo-group) is only used for offset storing and retrieval, not coordinated consumption. Coordinated consumers (aka high-level consumers or "balanced" consumers) are only available to JVM clients and a few non-JVM clients, but not kafka-python. @wizzat maybe we should put a note in the README to make this clear? |
One more question.... |
Currently, the "high-level" JVM consumers use ZK to coordinate which partitions are read by which threads. Each consuming thread in the JVM consumer will be reading from at least one partition, and these consumer threads can exist across multiple JVMs. This means you can create one logical "consumer group" that consists of several threads across several JVMs, e.g. a topic with 32 partitions could be read by 4 JVMs with 8 threads each and the data would be evenly distributed among the consumers. The reason we haven't added this feature is that there is a complex algorithm involving ZooKeeper to make sure a thread is consuming the correct partition at the correct offset. There are plans to redesign this "coordinated consumption" in Kafka so that it does not depend on ZooKeeper. This will make it easier for clients like kafka-python to do this kind of thing. So, in other words, we'll have it eventually. HTH |
thank you so much ! @mumrah |
I understand that kafka-python does not do consumer rebalancing (consumer failover) when consumers come and go. That has been made clear by this discussion thread. How does kafka-python handle broker rebalancing? Does it handle broker rebalancing? From https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design we see that the high-level Java Consumers do the following:
We know that kafka-python does not do #4, the consumer failover. Which of the others does kafka-python handle? Or conversely, which of the others does kakfa-python not handle? It appears to be handling #1 but none of the others. Can we get confirmation on that? Thanks. |
I use the group as the right way, but it just can't do the way i want.
here is my code,
What i want is , if i send "my-topic" a message, only one consumer can get this message from the group("my-foo-group")
However, what i found out is, no matter how many consumer process i start, all of them will get this message at the end.
Am i wrong or it's the problem of kafka python client ?
The text was updated successfully, but these errors were encountered: