-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statsd metrics dagrun.schedule_delay sends time in different format against the seconds specified in docs #33426
Comments
Feel free to fix it. |
cc: @ferruzzi as tje all-round metrics expert to advise :) |
That's very interesting that this came up now. I was just having a chat yesterday about how some of the timers emit seconds and others emit milliseconds and we should probably standardize on one or the other. As for this one, changing what is emitted is going to break someone but I agree that it is a bug which should be fixed. We currently have a number of metrics being emitted twice for various reasons, and adding a new metric is both easy and non-breaking. So my suggestion would be:
Question for @potiuk : How would we go about actually deprecating a metric? We can't do the usual "print a warning" like we do in a method/class/module because that would print constantly. |
I am not sure you can deprecate metric. The only idea I have is to provide new, consistent metrics and disable old ones by default, but allow to enable them selectively - by adding them to "enabled_deprecated_metrics" configuration list (but print deprecation warning in this case in log when they are emited). Another option is that if we all agree that the old metrics is WRONG - we treat it as bugfix and change them (also allowing probably to turn them back to what they were before - but with emitting deprecation warnings). Might be worth to implement some small framework allowing both cases, anticipating future cleanups we might want to do . |
That sounds like a good bit of initial work, but a solid plan if we want to start on a cleanup pass. Hmmm. I guess either way, that sounds like a separate task. I'll keep it in mind though. For this issue, I'd say for now let's double-emit. Just add another metric which does it in milliseconds and add a comment in the code explaining why we did that. If we do a cleanup and standardizing pass, we can worry about "deprecating" the old one. |
This is the code I retrieved from the python StatsdClient and it shows that timer if we pass the timedelta converts it into milliseconds and push it. With that said could I simply edit the documentation part alone saying its a milliseconds information that's being pushed to Statsd.
|
I am good for it - but I would have to analyze it in detail to be sure :) |
sure @potiuk Please let me know if you need any details from me on this |
Yeah. Start with the docs. I think we might want to do more of a consistency push when we add traces and then we can clean it up and make some consistent approach. |
hi @potiuk did you get a chance to analyze this part in detail? I will push changes in doc saying it is milliseconds info that statsd client pushes |
Go ahead. |
What do you see as an issue?
For the airflow statsd metric received at the statsd client, the metric
dagrun.schedule_delay
described in airflow docs says it sends the seconds of delay between the scheduled DagRun start date and the actual DagRun start date.But in the code, it sends the difference of the two datetime and datetime datatype is sent to the statsd timer metrics. In the client end it is received as the float object.
Though the Airflow UI doesn't indicate any schedule delay, the metrics received has data like below
Here the metrics points received is float and it could be microsecond / millisecond value of the datetime object as there is no conversion to seconds before setting it in Statsd metrics.
Solving the problem
Either converting the datetime object to seconds / milliseconds will fix the issue. Or if we are setting the datetime object in the statsd timer having a clear docs on what to expect value and its metric at statsd client end would be useful
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: