-
Notifications
You must be signed in to change notification settings - Fork 232
Documentation on PyKafka vs kafka-python #334
Comments
@microamp Thanks, this is a great idea. There's currently no documentation on this, but to my knowledge the main differences are the specifics of the Python API and PyKafka's implementation of the |
Some more research - there are differences in the versions of python supported by each library. PyKafka supports 2.7, 3.4, 3.5, and pypy, while kafka-python adds 2.6 and removes 3.5 support. kafka-python also requires a ZooKeeper connection for offset management, which PyKafka does not. kafka-python supports versions of Kafka from 0.8.0 to 0.8.2, where PyKafka only supports 0.8.2. |
@emmett9001 Thanks a lot for the reply. I find the information very helpful. It's good to know that PyKafka supports Python 3.4+. It was still work in progress the last time I checked a few months back. Good work guys. |
…tension The producer-futures feature was backed out of master, which means the expected interface for RdKafkaProducer._produce() has changed back, too. I've addressed all merge conflicts here - the change in the _produce() interface will be addressed in the next commit. * parsely/master: (26 commits) changelog updates for 2.0.3, dev version increment version Catch IOError in recvall_into util. Catch IOError during connection response. re-import weakref Revert 52ae7a1 Link #334 to README. drop autocommit logging to DEBUG level. fixes #337 update after socket error in offset manager discovery remove unused condition unconditionally update partition leaders on update Load all topic values on values method. fix typo causing interpreter error in reset_offsets` fix outdenting error clarify functools import catch all exceptions when removing from zookeeper be very specific about the error we expect producer: minimal changes for gc'ability balancedconsumer: minimal changes for gc'ability (RFC) add logging, fix some retry/reconnect/update logic in simpleconsumer ... Signed-off-by: Yung-Chin Oei <yungchin@yungchin.nl> Conflicts: .travis.yml pykafka/simpleconsumer.py tests/pykafka/test_producer.py
A difference between kafka-python and pykafka is the producer interface. kafka-python does not require that you know the topic when instantiating the producer. This is convenient if you need to produce to topics dynamically based on input (which I do!) :) |
@ottomata That seems like an interesting request for us to look at. Want to open a separate issue about that? |
Sure! |
@emmett9001 @ottomata Just got pointed at this thread and thought I'd make a late contribution. We compared pykafka and kafka-python about 2 months ago while trying to decide which one to use. In the end, the deciding factor for us was that balanced consumers were much easier to manage in pykafka. Also, we discovered later, a pykafka producer doesn't die on Kafka broker restart, while our kafka-python producers did. Below are performance figures from a 3-node Kafka cluster running in EC2, using a single producer or consumer. The three numbers for each test are the quartiles measured for the test.
So, for clarification, the median performance of a pykafka producer was 46500 messages per second, with a quartile range of 41400 (25th percentile) to 50200 (75th percentile). Hope that makes sense. |
This is awesome, thanks for the performance numbers @cscheffler. Do you have anything to share on the methodology you used to find them? |
Cool! For the producer bench, did you just use the default parameters? I assume async with req_acks=1? |
@cscheffler can you please share the links to the test scripts, if they are open-sourced? I see https://github.com/cscheffler/kafka-demo which uses pykafka. It would be great help if you can share the test scripts for kafka-python that were used in your comparison. Thanks! |
This writeup by @jofusa is the most thorough comparative benchmark of the python kafka clients I've seen. |
Leaving a url of another benchmark done recently between pykafka 2.3.1, kafka-python 1.1.1, and confluent-kafka 0.9.1 |
original author here. just a fyi those are one and the same |
It's Jul, 2017 is there any new update and a recent comparison? |
It's Jul, 2019! Any updates on the comparison? :) |
It's April, 2020! Newbe here, what i want to find is which one is friendly for us? |
It’s Sept, 2020! Anything update? |
Hello. I'd like to play around with Kafka, but I don't know which client to use to start with. I know there is at least one other Python client called kafka-python. I wonder if there is any documentation on comparison between the two. I'll start with PyKafka in the meantime. :)
The text was updated successfully, but these errors were encountered: