You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the broker will append to the leader (itself in standalone mode), but maybe this is not optimal. We should support for tuning select topics so that the server would buffer upto n-number of messages and it would flush them to the segment until it got >= that number, or some time has elapsed since it began buffering, or maybe also specify a size threshold.
This is particularly useful when you have many producers, all producing 1-2 messages at/time.
Kafka's likely doing something similar, but I am not sure what's the optimal way to do this in terms of semantics. Will need to consider Kafka's design, but here are a few questions that need be answered:
How to deal with incoming bundles with a compressed message set? Flush existing and restart buffering, or decompress them and buffer the messages in the compressed message set?
When should the broker respond (generate an ack) to the clients? immediately, or as soon as the messages the client produced, and are now buffered along with other messages, have all been flushed?
Some of the benefits:
Opportunities for compression, because when buffered messages will be flushed the broker will consider compression -- this may not be what we want though because we wouldn't want the compression to block the thread so maybe we can hand this off to another thread; a few us worth of interop delay shouldn't mean much
Far fewer system write()/writev() calls
The text was updated successfully, but these errors were encountered:
Imagine having 1,000 different clients all consistently producing 1 message (say 500 some messages/second) to the same Tank broker -- that would mean the Tank broker would need to execute over 500k write() calls (excluding any possible writes for updating indices) in order to write those messages to the current segment.
If instead we would collect all writes for each distinct segment received in the same poll events loop and then combine and apply them outside the loop and then notify the clients, that would perhaps work great, and shouldn't have been that hard to implement.
We should eventually support buffering published messages on the client, and flush them all as a single bundle when the buffered messages count exceeds a threshold, or a long time has passed since the first message was buffered.
This should be configurable on a per partition basis. It obviously means that if the application doesn't get to flush the data in-time, the data will be lost. It will still be useful if you can tolerate that possibility and/or the thresholds are kept low.
Alternatively, you could e.g buffer that into a file and periodically flush that file to Tank.
Currently, the broker will append to the leader (itself in standalone mode), but maybe this is not optimal. We should support for tuning select topics so that the server would buffer upto n-number of messages and it would flush them to the segment until it got >= that number, or some time has elapsed since it began buffering, or maybe also specify a size threshold.
This is particularly useful when you have many producers, all producing 1-2 messages at/time.
Kafka's likely doing something similar, but I am not sure what's the optimal way to do this in terms of semantics. Will need to consider Kafka's design, but here are a few questions that need be answered:
Some of the benefits:
The text was updated successfully, but these errors were encountered: