-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(statsd): Use statsdproxy to pre-aggregate metrics in-memory #2425
Conversation
use experimental statsdproxy hackweek project to aggregate counters and gauges (i.e. the "easy stuff") in memory before sending it over the UDP buffer. this might not work perfectly and most bizarrely, aggregating only some metric types probably will mess with timestamp accuracy (even though the flush interval is at a very low 1s). however, currently it's possible that we are dropping metrics because the udp send buffer is at its limits. so who knows really if this makes metrics more or less reliable...
This issue has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you remove the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
@Dav1dde some additional context that i forgot to share: we use statsdproxy in rust consumers to send data to DDM. the way this works is that we pre-aggregate using statsdproxy, then multiplex to the rust SDK, in order to offset the performance overhead that the rust SDK has. I think long-term statsdproxy is not the right abstraction for this, and in fact @Swatinem is already working on what I think could be a replacement for all of this. but in the short-term this would allow you to dogfood DDM in relay with minimal overhead (and no code locations). take a look at |
Thanks, that seems like a good approach and something we wanted to do anyways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try it.
We need to test this properly on Canary and S4S first, please don't merge if you don't have enough time to do that.
If you want I can pick this up and do the rollout sometime beginning of next week.
@@ -69,6 +69,7 @@ pub fn init_metrics(config: &Config) -> Result<()> { | |||
&addrs[..], | |||
default_tags, | |||
config.metrics_buffering(), | |||
config.metrics_aggregation(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: With this number of arguments, it might be nice to pass a StatsdConfig
object instead. Not a blocker though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan is to get rid of the options all together: #2425 (comment)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
flush_interval: 1, | ||
flush_offset: 0, | ||
max_map_size: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they should probably not have been options to begin with tbh
As discussed in #2425, removes the options, there is no reason not to buffer and not to aggregate/use statsdproxy. Also cleans up the configuration a bit.
use experimental statsdproxy hackweek project to aggregate counters and
gauges (i.e. the "easy stuff") in memory before sending it over the UDP
buffer.
We use the same code in rust consumers to pre-aggregate metrics. The
performance improvement is a wash (neither improves nor degrades perf),
but it should load on veneur, so it may still amount to cost savings.
Arpad has a kind of pre-aggregation that results in actual cost savings
within the application itself, in the future we may replace statsdproxy
with that.