-
Notifications
You must be signed in to change notification settings - Fork 107
Metricpoint batch format #1002
Comments
I have created #1032 with some benchmarks to compare the performance of 1 metricPoint per kafkaMessage to 10,50 and 100 metricPoints per message.
The improvement is significant. |
hmm only 35%, i would have expected more. as a reminder, from a typical MD message (203B for a fakemetrics message) down to a MetricPoint message (28B) is is a reduction of 86% (leaving only 14%) not sure how relevant https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines and https://www.cloudera.com/documentation/kafka/1-2-x/topics/kafka_performance.html still are, but seems best overall throughput would be 1kB upto 1MB (?) i imagine a producer like fakemetrics or carbon-relay-ng (when writing to kafka) would create a batch until either 50k (or whatever we find as optimal) has been reached or 1 second has passed. note that we can save upto 14.3% (4B out of 28B) more if we can batch messages together based on their timestamp. so while the producers are building their batches, they could also create a separate batch per timestamp seen. though then we have to worry about the - unlikely - scenario of a sender using different timestamps, which could create too many small batches. so for now let's not worry about this additional optimization. |
we are never going to be able to put thousands of points in a single message in production environments. In production environments we have We definitely dont want to buffer messages for more than 500ms as we cant send a 200 back to crng until the points have been committed to kafka. So that means the most we can batch is the (rate /2)/(tsdb-gws * partitions) There is no change in disk saving between 100points per message and 500points per message. |
This is why i specifically mentioned producers like carbon-relay-ng (for those who want to write carbon-relay-ng directly to kafka. pretty sure we have customers with on-prem production environments wanting to do this, or already doing this) and fakemetrics for benchmarking. Furthermore your math is based on average throughput and equal distributions. So it is possible to have thousands of metrics to produce into a given partition at once. PS: the carbon-relay-ng<->tsdbgw interaction has always been (or used to be ) very latency sensitive.
I would advise against deriving a batch size based on a preconceived rate combined with the max-flush-wait condition (rate/2 because of the 500ms clause). |
i dont think we should cap the number of points per message to 100. My comment "target between 100 and 500 points per message" should probably have been written as "optimize for 100 to 500 points per message" The primary reason i commented was to ensure we dont waste time benchmarking use cases that would never happen in production. ie.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think there is still benefit to this. I think that a simple batching scheme of just concatenating MetricPoint msgs together should be sufficient. |
I'm working on this to see if it will reduce resource usage both on the producer side and consumer (metrictank) side. I have a PoC running that looks promising. Before I roll it out to our staging cluster it would be good to get some tentative agreement on the batch format. The PR lives at bloomberg#111 and the format is very basic. Essentially, instead of a single format byte like the existing messages it takes 2 format bytes. The first byte should be |
#876 introduced the MetricPoint format which helped us reduce resources, but not to the extent we liked.
the hypothesis is that kafka message overhead has become significant, and now we should batch up MetricPoint (or MetricPoint-like) messages within single kafka messages.
Likely this will help us reduce kafka diskspace/network io (and perhaps some kafka cpu), but likely not ingest speed. The above link has some experiments and numbers that lead us to believe this.
The text was updated successfully, but these errors were encountered: