Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing advanced librdkafka options to the kafka source and sink #1821

Closed
ghost opened this issue Feb 17, 2020 · 4 comments · Fixed by #1829 or #1830
Closed

Allow passing advanced librdkafka options to the kafka source and sink #1821

ghost opened this issue Feb 17, 2020 · 4 comments · Fixed by #1829 or #1830
Labels
sink: kafka Anything `kafka` sink related source: kafka Anything `kafka` source related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@ghost
Copy link

ghost commented Feb 17, 2020

librdkafka which underlies kafka source and sink supports many advanced configuration options.

Some users might have a reason to tune some of them. I propose to add a new advanced configuration option called librdkafka_options to the configuration of both of them.

Its usage could look like this:

[sinks.my_sink_id]
  # REQUIRED - General
  type = "kafka" # must be: "kafka"
  inputs = ["my-source-id"] # example
  bootstrap_servers = "10.14.22.123:9092,10.14.23.332:9092" # example
  key_field = "user_id" # example
  topic = "topic-1234" # example
  # REQUIRED - requests
  encoding = "json" # example, enum
  # OPTIONAL - General
  healthcheck = true # default
  [sinks.my_sink_id.librdkafka_options]
    "max.in.flight.requests.per.connection" = "1000000"
    "max.poll.interval.ms" = "300000"
    "socket.timeout.ms" = "60000"

For example, ClickHouse, which uses librdkafka as the base of its Kafka engine too, allows to pass arbitrary options to it.

Related to #1818.

@ghost ghost added sink: kafka Anything `kafka` sink related type: enhancement A value-adding code change that enhances its existing functionality. source: kafka Anything `kafka` source related labels Feb 17, 2020
@ghost ghost changed the title Allow passing advanced librdkafka options in the kafka source and sink Allow passing advanced librdkafka options to the kafka source and sink Feb 17, 2020
@binarylogic
Copy link
Contributor

I don't dislike this, but I think it's worth discussing mapping these options ourselves. For obvious reasons:

  1. By exposing this to our users we will be even more locked into librdkafka. I'm not that worried about this, because it's unlikely we'll move away.
  2. I'd still like to document the available options so they are searchable, etc. And if we're going to do that, it probably makes sense to explicitly support them in the code too.

The other question: how often are these options changing?

@ghost
Copy link
Author

ghost commented Feb 17, 2020

I don't dislike this, but I think it's worth discussing mapping these options ourselves

I'm fine with mapping selected options ourselves instead. It has the benefit of supporting type checks for the values, so integer options would be required to be integers and enum options would be required to be enums.

The main question is what options should we map out of more than a hundred of supported options, but we can map only the options which we see demand for. For example, we can start from exposing socket.timeout.ms and message.timeout.ms which are related to #1818.

Actually, we already have some advanced options mapped, such as commit.interval.ms and session.timeout.ms for kafka source.

The other question: how often are these options changing?

From the history looks like mostly just new options are added, but the old ones are not removed.

@binarylogic
Copy link
Contributor

The main question is what options should we map out of more than a hundred of supported options

I'm leaning towards a hybrid approach. Let's map the common options that help to deliver a good UX for the common case, then we could provide a librdkafka_options table as an escape hatch. This at least communicates to the user that these are "more advanced" options.

@binarylogic
Copy link
Contributor

And I don't know enough about Kafka to say which ones are common. The 2 you listed should probably be mapped, and I'd use your best judgment for any others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: kafka Anything `kafka` sink related source: kafka Anything `kafka` source related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
1 participant