Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support synchronous sending for batches of messages #674

Closed
JasonRosenberg opened this issue Jun 9, 2016 · 9 comments
Closed

Support synchronous sending for batches of messages #674

JasonRosenberg opened this issue Jun 9, 2016 · 9 comments

Comments

@JasonRosenberg
Copy link

Currently, the sync producer doesn't support sending batches of messages, even though the underlying protocol allows this (and the java client library supports this too).

We have a need to send a batch of messages reliably with acknowledgment, but don't want to incur the multiple round trips required with sending each message individually.

@eapache
Copy link
Contributor

eapache commented Jun 9, 2016

There is a minor ambiguity here: do you need to ensure that the batch succeeds or fails atomically as a single group, or do you just need to know when they've all been delivered?

Atomic batch-level success/failure is complicated, and I'm not likely to get to it in the near-future (pull requests welcome of course). If you don't need that, then the question becomes: what should the hypothetical SendMessages([]*ProducerMessage) return when some of the messages succeed but others fail?

@tadbook
Copy link

tadbook commented Jun 9, 2016

I would assume that you would return a []*ProducerResponse where a ProducerResponse includes an error that might be nil. Something like the ProducerError returned from the async producer.

@JasonRosenberg
Copy link
Author

I think it should behave the same as the java client version, which is to only indicate whether all messages succeeded, or not. If there was a failure, don't bother with returning details about which ones succeeded and which ones failed. Producer clients can expect to resend the whole batch if there is any failure (which is ok, since with kafka, we only guarantee at-least-once delivery, and consumers have to be tolerant of duplicates, and process messages idempotently).

@JasonRosenberg
Copy link
Author

fwiw, this is also how async sending should work too, really (e.g. internally, you buffer up messages and send out batches, and then only report success or failure at the batch level)....

@eapache
Copy link
Contributor

eapache commented Jun 9, 2016

internally, you buffer up messages and send out batches

We already do that.

then only report success or failure at the batch level

We decided that callers shouldn't have to care about our internal choice of batches (which is good, since the algorithm we use has changed a few times already), so we propagate the batch-level errors back to individual messages. This results in a simpler API.

@JasonRosenberg
Copy link
Author

Agreed, sending the batch-level result back to individual messages makes sense.

@tadbook
Copy link

tadbook commented Jun 10, 2016

So I dug into this a little more. It looks like the sync producer is actually built on top of the async producer. It just sends a message to the async producer and blocks on a channel that it puts into the metadata. When it gets a response from the async producer, it writes to the channel, unblocking the first call. So, assuming the async producer uses batching, sending a bunch of individual messages to the sync producer should batch them up, as long as they are sent from separate goroutines, so you aren't waiting on the response from one before sending the next. Not an optimal interface, though.

@eapache
Copy link
Contributor

eapache commented Jun 10, 2016

@tadbook your analysis is correct (and yes the async producer does use batching) however that method would not guarantee ordering within a given batch.

#677 implements a proper SendMessages(msgs []*ProducerMessage) method, still built on top of the async producer but in a way that guarantees order and only requires one goroutine instead of n.

@tadbook
Copy link

tadbook commented Jun 10, 2016

Awesome! Thanks for doing that, and thanks for the quick turnaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants