Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PubSub for Kafka #271

Open
c-thiel opened this issue Aug 15, 2024 · 11 comments
Open

Support PubSub for Kafka #271

c-thiel opened this issue Aug 15, 2024 · 11 comments
Labels
good first issue Good for newcomers

Comments

@c-thiel
Copy link
Contributor

c-thiel commented Aug 15, 2024

No description provided.

@c-thiel c-thiel added the good first issue Good for newcomers label Aug 15, 2024
@Grongrilla
Copy link
Contributor

I would like to give this one a shot

@Grongrilla
Copy link
Contributor

Grongrilla commented Aug 16, 2024

Turns out there is something to discuss right from the get go.

I kind of expected that there would be one clear choice for a kafka rust lib, but at least at first glance, there is not.

There seem to be two more or less mature implementations available:

kafka-rust

Seems to be a pure rust implementation. A first look at the examples shows that it seems to be pretty easy to use. There a few things to mention, though

  • Last release is 0.10.0, which is 11 months old. It has no tag in the GitHub repo (last tag is 0.9.0 from may 2022). Repo seems to be active, though
  • They do not regard their lib as production ready: "Use it in production at your own risk."
  • There seems to be no async client/producer implemented

rust-rdkafka

... is actually "just" a safe interface to librdkafka

  • Last release is 0.36.2, which is 7 months old< librdkafka of course is actively maintained, the last release is 2 months old.
  • async producer available
  • They do not mention if the lib is production ready, however since it is based on librdkafka, one might assume that it should be
  • rust-rdkafka can be enabled for Cloud Events SDK via feature flat. I am not sure if this is relevant, as the same is true for nats, however that feature flag is not used in TIP
  • Reliance on a c lib has of course caveats.
    • librdkafka can be linked statically when the lib is build, using cmake (which is required on the build system); if SSL is required, libssl-dev is required.... and so on
    • librdkafka could be linked dynamically, introducing a dependency on the system that is running TIP

what next?

I am not sure what is the best choice here. If introducing a c lib is not an option, rust-kafke seems to be the only choice. If it is ok to schlepp around a c lib, rust-rdkafka is also async and seems to be "endorsed" by Cloud Events SDK.

Or maybe I am overlooking "that other kafka rust lib", that has less downsides than kafka-rust or rust-rdkafke 😄

@twuebi
Copy link
Contributor

twuebi commented Aug 20, 2024

@twuebi
Copy link
Contributor

twuebi commented Aug 20, 2024

In terms of maturity & user-base it probably makes sense to stick to rdkafka for now, eventually we should switch over to a rust-native implementation to get rid of the C dependency.

@Grongrilla
Copy link
Contributor

Grongrilla commented Aug 20, 2024

@twuebi

samsa indeed looks promising, but from your second comment I gather: rdkafka it is, for now.

Three questions:

  • use the feature flag on cloud events sdk to pull in rdkafka, or add the dependency our self?
  • Build librdkafka or depend on an existing version?
  • which "extra features" (ssl, ...) are required?

I'd probably vote

  • Bring in dependency ourself, which is consistent with how nats is used
  • Build it our self
  • I guess SSL is a must, no idea about the rest

@twuebi
Copy link
Contributor

twuebi commented Aug 22, 2024

I'd say let's give rdkafka a try then, we should probably depend on cloudevents sdk's packaged rdkafka, from a cursory read, it seems that serialization of cloudevents to kafka is a bit more involved than what we do for nats, compare cloudevents-sdk-0.7.0/src/binding/nats/serializer.rs:19 with cloudevents-sdk-0.7.0/src/binding/rdkafka/kafka_producer_record.rs:24.

We've gone for depending on async-nats directly since cloudevents didn't package async-nats IIRC.

@twuebi
Copy link
Contributor

twuebi commented Aug 22, 2024

Existing publishers can be found in crates/iceberg-catalog/src/service/event_publisher.rs:166..

@Grongrilla
Copy link
Contributor

Grongrilla commented Aug 22, 2024

@c-thiel @twuebi

I just realized, that the latest release of cloudevents sdk depends on rdkafka ^0.29. Current release is 0.36.2.

0.29 is almost 2 years old. It depends on librdkafka 1.9, which is also almost 2 years old. Current version of librdkafka is 2.5.

The main branch of cloudevents sdk is already on ^0.36

Tbh, I am not sure what would be a good way to solve this 🙈

  • If we do not bring in the feature, we loose the serialization for cloudevents to kafka and have to implement it ourselfs.
  • ... or we could get cloudevents sdk from github, depending on the main branch.

@twuebi
Copy link
Contributor

twuebi commented Aug 22, 2024

Hm, unfortunate, I'd say either ask CloudEvents-sdk for a release or vendor their serialization code for rdkafka

@Grongrilla
Copy link
Contributor

If venodring is an option, I will do that. I can continue (well, start...) working and should also make things easier if or when cloud events sdk release a new version.

Regarding asking cloud events sdk for a release: maybe something you could or should do @twuebi? I'd maybe feel a bit uncomfortable since this is not my codebase 😅

@twuebi
Copy link
Contributor

twuebi commented Aug 26, 2024

then let's start with vendoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants