-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistently use baseTimeUnit in StatsD #5687
base: main
Are you sure you want to change the base?
Consistently use baseTimeUnit in StatsD #5687
Conversation
@coreyoconnor Please sign the Contributor License Agreement! Click here to manually synchronize the status of this Pull Request. See the FAQ for frequently asked questions. |
@coreyoconnor Thank you for signing the Contributor License Agreement! |
I think there might be a misunderstanding since The expectation is that a recorded value is converted to the unit of the registry before publication. When you record a value using a timer and a unit, the unit you used for the recording has nothing to do with the unit used for publishing. (Btw we don't recommend measuring and recording duration like that, but we recommend using This means that if the unit of the registry is milliseconds, it does not matter if you record in seconds, millis or micros, everything will be converted and published in millis. Lines 464 to 467 in 4d9efae
So using your example, if you do this: timer.record(2_000, TimeUnit.SECONDS); // about 30 min It will be (recorded as nanos under the hood and) published in the unit of the registry. Since StatsD uses milliseconds as the unit, you will see This is the expected behavior which is also guarded by tests and that's why that your change in |
Starting from the end as that will explain things:
indeed! I am using with But what backend is this for? Datadog There are several other issues where users have encountered the same unexpected behavior from datadog. I can pull these up later. Specifically: Datadog does not support units for timers. Timers are unit-less gauges. So, setting the units has no effect - datadog is going to accept them as unit-less and presumes you've pre-scaled them to whatever you want or manually changed the metric config in the UI. Obviously, changing the metric config in datadog for every timer after they've been submitted is a non-starter. So what to do? Seems to me like overriding the unit should do what is documented as: Overrides the unit used. "The base time unit of the timer to which all published metrics will be scaled". Which, if it was true, would do what is expected for datadog. So even if there is no room for adjusting micrometer to scale the values nicely: The documentation is a bit misleading no? |
As I said above, it seems that "millis" is hardcoded in a few places in case of StatsD which we can fix. Please remove your change from Could you please also run
If I understand this correctly, I don't think the documentation is misleading, I think the docs are ok but we have a bug: I agree with the intention of the docs but the current behavior does not follow that. :) After fixing the issue, and you still think the docs are misleading, please open an issue/PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also fix StatsdFunctionTimer
and StatsdLongTaskTimer
?
@@ -258,7 +258,7 @@ public void record(Runnable f) { | |||
@Override | |||
public final void record(long amount, TimeUnit unit) { | |||
if (amount >= 0) { | |||
histogram.recordLong(TimeUnit.NANOSECONDS.convert(amount, unit)); | |||
histogram.recordLong(baseTimeUnit().convert(amount, unit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this change, this should be recorded as nanos and converted to the right unit at publish-time.
@@ -43,9 +43,9 @@ public class StatsdTimer extends AbstractTimer { | |||
|
|||
StatsdTimer(Id id, StatsdLineBuilder lineBuilder, FluxSink<String> sink, Clock clock, | |||
DistributionStatisticConfig distributionStatisticConfig, PauseDetector pauseDetector, TimeUnit baseTimeUnit, | |||
long stepMillis) { | |||
long stepBaseUnits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long stepBaseUnits) { | |
long amount) { |
super(id, clock, distributionStatisticConfig, pauseDetector, baseTimeUnit, false); | ||
this.max = new StepDouble(clock, stepMillis); | ||
this.max = new StepDouble(clock, stepBaseUnits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.max = new StepDouble(clock, stepBaseUnits); | |
this.max = new StepDouble(clock, amount); |
Yes, certainly: I will resolve the tests. As well as the other issues. Where would be good to document this aspect of datadog? At least enough to point somebody in the right direction. The statsd registry or config? As you mentioned statsd works in milliseconds. So, Datadog is an exceptional case then? I'm not familiar with the standards but sounds like it! In that case, should this be a datadog specific config somehow? Or is the override of the base time unit sufficient to capture the exceptional aspect? for reference, the related motivating tickets. These are duplicates - just adding here for completeness
discussion about the the unitless of timers: |
Draft: Opening for discussion with (incomplete) proposed changes.
There is a
baseTimeUnit
forTimer
s that appears to be inconsistently used. Given the definition "The base time unit of the timer to which all published metrics will be scaled" I'd expect that a value, eg 2000 sec, reported as the value 2000 with the unit seconds would be emitted to statsd as the value in the base time units, 2000 in this example. However, the actual value is 2,000,000. Which is the corresponding millisecond value for the seconds expected to be recorded.As that explainer above is a challenge to read: I've included some (untested or even compiled) changes to illustrate.