-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize tags for metrics sent to SafeDogStatsdLogger #8743
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
I was also thinking about this. As a simple use case (which sounds similar to your use case), it would be great if the airflow.ti_successes and airflow.ti_failures where tagged with the DAG and task. Maybe even the worker host name/pool? |
Related: #12158 |
Great! I missed this notification but ill definitely be checking it out when I get the chance |
@williamBartos I stumbled upon this today which could be a good work around: https://docs.datadoghq.com/developers/dogstatsd/dogstatsd_mapper/. Also seems like this use case is common enough that airflow metrics are the actual example that DataDog uses |
Nice. Also I think the right solution to that is to add open-telemetry support rather than extend Airflow's DataDog StatsD integration. We have a nice AIP (Airflow Improvement Proposal): https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow and discussion https://lists.apache.org/thread/1sdz7mx8j495h5vddngrwpxbx03p04z6 where any comments are welcome. The proposal there covers much more comprehensive approach for the Open-Telemetry integration and I think we should rather focus on makitng it happen rather than changing our DataDog or StatsD integration. DataDog already support OTel integration and once we implement OTel in Airlfow, not only you will be able to plugin the metrics but also all other telemetry data including detailed metrics of Airflow internals, Traces that will allow you to trace task and dag execution across multiple components as well as (in the future) logging. And not only with Datadog but with all the other telemetry services. |
cc: @howardyoo |
Closed by #28961 |
Description
A recent pr enabled dogstatsd support for Airflow metrics: #7376. While this enables the use of dogstatsd, the code sending metrics to SafeDogStatsdLogger doesn't utilize tagging and instead sends a unique, monolithic metric that cant be aggregated across identifiers such as
<dag_id>
. This isn't scalable when someone wants to monitor metrics across multiple DAG as each metric sent by each DAG is unique. The amount of monitors increases with the amount of DAGs.An example here are the timer metrics sent by a DagRun, such as
dagrun.duration.failed.<dag_id>
. When sent by the DagRun object,<dag_id>
isn't a tag but part of the entire metric itself: https://github.com/apache/airflow/blob/master/airflow/models/dagrun.py#L412-L420What is the problem here?
By sending metrics to DataDog without tags, it becomes impossible to aggregate metrics across
<dag_id>
because eachdagrun.duration.failed.<dag_id>
sent by a DAG is completely unique to that<dag_id>
.If I have 20 dags in production and want to monitor
dagrun.duration.failed.<dag_id>
, that means I'll need 20 separate monitors!But if
<dag_id>
is sent as a tag, a single monitor could be used and DataDog can group the metric by<dag_id>
.Use case / motivation
The current way metrics are sent to DataDog isn't scalable as its preventing a user from aggregating common metrics across unique tags.
Following the DagRun example given above, the information needed to send this metric as a tag is available. Given this line of code: https://github.com/apache/airflow/blob/master/airflow/models/dagrun.py#L418 and the accompanying function definition: https://github.com/apache/airflow/blob/master/airflow/stats.py#L172 we can modify the function call to send
<dag_id>
as a tag:toy example:
The preference here is probably not to do type checking before submitting the metric. I'm willing to discuss other solutions here or as part of a PR, and to implement the agreed upon solution.
Related Issues
This is the ticket that created the
SafeDogStatsdLogger
class: #7376The text was updated successfully, but these errors were encountered: