SerializingProducer is much slower than Producer in Python #1440

zacharydestefano89 · 2022-10-06T20:24:43Z

Description

I was working on code to produce messages to a Kafka topic. The messages are protobuf bytes and I used SerializingProducer to pass the schema information. I tried a separate method where I imitated what was done here

It was able to produce and flush messages at a rate of about 12 messages per second. For my use case, this is way too slow.

When I just used Producer and took out any schema information, the rate suddenly jumped to ~100s of messages per second.

How to reproduce

Write a job to put thousands of messages onto a Kafka topic
Have the job put schema information into each message and time it
Compare it to the same job that does put in schema information

Checklist

Please provide the following information:

confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):

From requirements.txt with the Python library:
confluent-kafka==1.7.0

From console:

>>> import confluent_kafka
>>> confluent_kafka.libversion()
('1.7.0', 17236223)
>>> confluent_kafka.version()
('1.7.0', 17235968)
>>>

Apache Kafka broker version:
Confluent Cloud
Client configuration: {...}

Producer config:

{'bootstrap.servers': '...',
 'error_cb': <function error_cb at 0x7fd2dc01f820>,
 'sasl.mechanism': 'PLAIN',
 'sasl.password': '***************************',
 'sasl.username': '***************',
 'security.protocol': 'SASL_SSL'}

Operating system:

Run from docker container derived from Python 3.8.8 base

First line of Dockerfile:
FROM python:3.8.8

Provide client logs (with 'debug': '..' as necessary)

Using SerializingProducer:

INFO:root:Now adding 221 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:05:42 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:05:42.031972+00:00 : Adding message starting `user_i_d: "******` onto Kafka buffer under topic `***`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 221 messages: 34.54440498352051 seconds

Using Producer:

[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now adding 54 messages to Kafka topic. INFO mode will display the first and last 3 messages, DEBUG mode will display all of them
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:2022-10-06T20:06:16.675951+00:00 : Adding message starting `b'\n\****\x1` onto Kafka buffer under topic `****`
...
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Now flushing Kafka producer
[2022-10-06, 20:06:16 UTC] {docker.py:310} INFO - INFO:root:Time to produce and flush for chunk of 54 messages: 0.18948936462402344 seconds

Critical issue: Not critical, have a workaround

The text was updated successfully, but these errors were encountered:

mhowlett · 2022-10-19T17:05:13Z

It was able to produce and flush messages at a rate of about 12 messages per second

are you flushing after every produce? (this will be slow)

zacharydestefano89 · 2022-10-19T18:08:08Z

It was able to produce and flush messages at a rate of about 12 messages per second

are you flushing after every produce? (this will be slow)

I tried both flushing after every produce and flushing after producing many messages. In both cases, messages were put on the topic at that aforementioned rate, 12 per second.

mhowlett · 2022-10-24T17:11:52Z

~100s messages per second.

you should be able to get 10s of thousands of messages per second without the protobuf serdes. i don't have a good feel for how performant the protobuf serdes are (and you don't say anything about the size of your messages), but 12 per second seems very low.

It doesn't seem like we have a benchmark application for Python, we should write one (marking as enhancement).

edenhill · 2022-10-24T17:17:02Z

I get the feeling it is doing a schema-registry lookup for each message, which would explain the low thruput.
Maybe worth checking, somehow?

CTCC1 · 2022-10-25T21:25:29Z

I reported the unnecessary lookup in 2020 #935
It was fixed by #1133 so 1.8.2+
So i think upgrading to 1.8.2+ should fix.

pranavrth · 2024-02-20T12:35:30Z

Can you please confirm if it was fixed with the version upgrade?

mhowlett added the question label Oct 19, 2022

mhowlett added the enhancement label Oct 24, 2022

mhowlett added the MEDIUM label Oct 24, 2022

pranavrth assigned pranavrth and unassigned pranavrth Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SerializingProducer is much slower than Producer in Python #1440

SerializingProducer is much slower than Producer in Python #1440

zacharydestefano89 commented Oct 6, 2022 •

edited

Loading

mhowlett commented Oct 19, 2022

zacharydestefano89 commented Oct 19, 2022

mhowlett commented Oct 24, 2022

edenhill commented Oct 24, 2022

CTCC1 commented Oct 25, 2022

pranavrth commented Feb 20, 2024

SerializingProducer is much slower than Producer in Python #1440

SerializingProducer is much slower than Producer in Python #1440

Comments

zacharydestefano89 commented Oct 6, 2022 • edited Loading

Description

How to reproduce

Checklist

mhowlett commented Oct 19, 2022

zacharydestefano89 commented Oct 19, 2022

mhowlett commented Oct 24, 2022

edenhill commented Oct 24, 2022

CTCC1 commented Oct 25, 2022

pranavrth commented Feb 20, 2024

zacharydestefano89 commented Oct 6, 2022 •

edited

Loading