Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to record duration of externally triggered DAGs #18669

Closed
1 of 2 tasks
easthy-alterpost opened this issue Oct 1, 2021 · 3 comments
Closed
1 of 2 tasks

Failed to record duration of externally triggered DAGs #18669

easthy-alterpost opened this issue Oct 1, 2021 · 3 comments
Assignees
Labels

Comments

@easthy-alterpost
Copy link

easthy-alterpost commented Oct 1, 2021

Apache Airflow version

2.1.4 (latest released)

Operating System

Amazon Linux 2 AMI

Versions of Apache Airflow Providers

statsd 3.3.0
apache-airflow-providers-amazon 2.2.0
apache-airflow-providers-celery 2.0.0

Deployment

Other

Deployment details

Cloudformation template based on https://github.com/villasv/aws-airflow-stack/blob/v2/aws/cloud-formation-template.yml

What happened

Fail to get Airflow statsd metric dagrun_duration_success from DAGs that are externally triggered by another DAG via
TriggerDagRunOperator(task_id='trigger_dag_enter_point_task', trigger_dag_id=DAG_NAME)
I can successfully get that metric if I manually clear the last task's state of triggered DAG.

At logs I can find

Oct  1 09:45:56 ip-172-31-64-142 turbine: [#033[34m2021-10-01 09:45:56,716#033[0m] {#033[34mdagrun.py:#033[0m647} 
WARNING#033[0m - Failed to record duration of <DagRun unload_datamarts_to_data_lake @ 2021-10-01 09:45:37.380321+00:00: manual__2021-10-01T09:45:37.338487+00:00, externally triggered: True>: start_date is not set.#033[0m

I think that this is the source of issue. Externally triggered DAG have no start_date

What you expected to happen

Externally triggered DAGs got their start dates and Airflow sent dagrun_duration_success metrics via statsd

How to reproduce

The first DAG code:

from airflow.operators.dagrun_operator import TriggerDagRunOperator
with DAG('stage_area',
                schedule_interval='*/2 * * * *',
                 start_date=datetime(2021, 09, 30),
                 catchup=False,
                 max_active_runs=1) as dag:
    TriggerDagRunOperator(task_id='trigger_mwl_data_vault',
                                            trigger_dag_id='data_vault')

The second DAG code:

from airflow.operators.dummy import DummyOperator

with DAG('data_vault',
                schedule_interval=None,
                start_date=datetime(2021, 09, 30),
                catchup=False,
                max_active_runs=1) as dag:
    raw_data_vault_start = DummyOperator(task_id='raw_data_vault_start')

Anything else

The problem is persistent

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@easthy-alterpost easthy-alterpost added area:core kind:bug This is a clearly a bug labels Oct 1, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 1, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@SamWheating
Copy link
Contributor

I'm interested in looking into this - feel free to assign it to me.

@SamWheating
Copy link
Contributor

Upon investigation it like this was already reported in #18082 and fixed in #18226.

Feel free to close 👍

@uranusjr uranusjr closed this as completed Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants