Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass batch options to librdkafka #3483

Closed
vpedosyuk opened this issue Aug 17, 2020 · 2 comments · Fixed by #5010
Closed

Pass batch options to librdkafka #3483

vpedosyuk opened this issue Aug 17, 2020 · 2 comments · Fixed by #5010
Assignees
Labels
sink: kafka Anything `kafka` sink related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@vpedosyuk
Copy link

vpedosyuk commented Aug 17, 2020

Current Vector Version

0.10.0

As kafka does its own batching we don't expose any batching config on the Vector side as that would be redundant. This is a usability challenge and we should allow passing batch configuration options into librdkafka.


Old issue

Use-cases

Currently, the Vector docs state that batching in the kafka sink is unsupported. However, this would be very useful in order to achieve the highest throughput when dealing with a high amount of data:

  1. the larger the batch the higher likelihood of a higher compression ratio
  2. amortizes the messaging overhead and eliminates the adverse effect of the round trip time

This is greatly explained here:
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance

For example, Kafka-to-Kafka use-case:
https://vector.dev/guides/integrate/sources/kafka/kafka/
If Vector supported Kafka batching it'd be a really great alternative to Kafka MirrorMaker, Replicator, etc in this Kafka-to-Kafka use-case.

@vpedosyuk vpedosyuk added the type: enhancement A value-adding code change that enhances its existing functionality. label Aug 17, 2020
@jamtur01 jamtur01 added the sink: kafka Anything `kafka` sink related label Sep 29, 2020
@awangc
Copy link

awangc commented Nov 5, 2020

So with support for librdkafka options: #1821 and given default values for librdkafka https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md, a kafka sink will still not batch data?

@lukesteensen
Copy link
Member

Sorry for the confusion here. The kafka sink does utilize all of the standard librdkafka batching functionality, for all of the reasons you described. The docs are worded imprecisely and we will fix that.

The intended message is that the kafka sink does not expose the standard batch.* configuration options because we do not do our own independent batching ahead of librdkafka, which would be redundant. This is a little bit of a usability wart and I think it could be a good idea for us to translate those options into their librdkafka equivalents and pass them down. But there are currently no functional limits on your ability to use batching with the kafka sink.

@jamtur01 jamtur01 changed the title Support batching in the kafka sink Pass in batch options to librdkafka Nov 5, 2020
@jamtur01 jamtur01 changed the title Pass in batch options to librdkafka Pass batch options to librdkafka Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: kafka Anything `kafka` sink related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants