-
Notifications
You must be signed in to change notification settings - Fork 16.3k
External task sensor fail fix #27190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
o-nikolas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add or update a unit test case to cover this edge? So that we don't regress again in the future. Somewhere here-ish:
airflow/tests/sensors/test_external_task_sensor.py
Lines 344 to 494 in 753fe42
| def test_external_task_sensor_fn_multiple_execution_dates(self): | |
| bash_command_code = """ | |
| {% set s=logical_date.time().second %} | |
| echo "second is {{ s }}" | |
| if [[ $(( {{ s }} % 60 )) == 1 ]] | |
| then | |
| exit 1 | |
| fi | |
| exit 0 | |
| """ | |
| dag_external_id = TEST_DAG_ID + '_external' | |
| dag_external = DAG(dag_external_id, default_args=self.args, schedule=timedelta(seconds=1)) | |
| task_external_with_failure = BashOperator( | |
| task_id="task_external_with_failure", bash_command=bash_command_code, retries=0, dag=dag_external | |
| ) | |
| task_external_without_failure = EmptyOperator( | |
| task_id="task_external_without_failure", retries=0, dag=dag_external | |
| ) | |
| task_external_without_failure.run( | |
| start_date=DEFAULT_DATE, end_date=DEFAULT_DATE + timedelta(seconds=1), ignore_ti_state=True | |
| ) | |
| session = settings.Session() | |
| TI = TaskInstance | |
| try: | |
| task_external_with_failure.run( | |
| start_date=DEFAULT_DATE, end_date=DEFAULT_DATE + timedelta(seconds=1), ignore_ti_state=True | |
| ) | |
| # The test_with_failure task is excepted to fail | |
| # once per minute (the run on the first second of | |
| # each minute). | |
| except Exception as e: | |
| failed_tis = ( | |
| session.query(TI) | |
| .filter( | |
| TI.dag_id == dag_external_id, | |
| TI.state == State.FAILED, | |
| TI.execution_date == DEFAULT_DATE + timedelta(seconds=1), | |
| ) | |
| .all() | |
| ) | |
| if len(failed_tis) == 1 and failed_tis[0].task_id == 'task_external_with_failure': | |
| pass | |
| else: | |
| raise e | |
| dag_id = TEST_DAG_ID | |
| dag = DAG(dag_id, default_args=self.args, schedule=timedelta(minutes=1)) | |
| task_without_failure = ExternalTaskSensor( | |
| task_id='task_without_failure', | |
| external_dag_id=dag_external_id, | |
| external_task_id='task_external_without_failure', | |
| execution_date_fn=lambda dt: [dt + timedelta(seconds=i) for i in range(2)], | |
| allowed_states=['success'], | |
| retries=0, | |
| timeout=1, | |
| poke_interval=1, | |
| dag=dag, | |
| ) | |
| task_with_failure = ExternalTaskSensor( | |
| task_id='task_with_failure', | |
| external_dag_id=dag_external_id, | |
| external_task_id='task_external_with_failure', | |
| execution_date_fn=lambda dt: [dt + timedelta(seconds=i) for i in range(2)], | |
| allowed_states=['success'], | |
| retries=0, | |
| timeout=1, | |
| poke_interval=1, | |
| dag=dag, | |
| ) | |
| task_without_failure.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) | |
| with pytest.raises(AirflowSensorTimeout): | |
| task_with_failure.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) | |
| def test_external_task_sensor_delta(self): | |
| self.add_time_sensor() | |
| op = ExternalTaskSensor( | |
| task_id='test_external_task_sensor_check_delta', | |
| external_dag_id=TEST_DAG_ID, | |
| external_task_id=TEST_TASK_ID, | |
| execution_delta=timedelta(0), | |
| allowed_states=['success'], | |
| dag=self.dag, | |
| ) | |
| op.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) | |
| def test_external_task_sensor_fn(self): | |
| self.add_time_sensor() | |
| # check that the execution_fn works | |
| op1 = ExternalTaskSensor( | |
| task_id='test_external_task_sensor_check_delta_1', | |
| external_dag_id=TEST_DAG_ID, | |
| external_task_id=TEST_TASK_ID, | |
| execution_date_fn=lambda dt: dt + timedelta(0), | |
| allowed_states=['success'], | |
| dag=self.dag, | |
| ) | |
| op1.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) | |
| # double check that the execution is being called by failing the test | |
| op2 = ExternalTaskSensor( | |
| task_id='test_external_task_sensor_check_delta_2', | |
| external_dag_id=TEST_DAG_ID, | |
| external_task_id=TEST_TASK_ID, | |
| execution_date_fn=lambda dt: dt + timedelta(days=1), | |
| allowed_states=['success'], | |
| timeout=1, | |
| poke_interval=1, | |
| dag=self.dag, | |
| ) | |
| with pytest.raises(exceptions.AirflowSensorTimeout): | |
| op2.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) | |
| def test_external_task_sensor_fn_multiple_args(self): | |
| """Check this task sensor passes multiple args with full context. If no failure, means clean run.""" | |
| self.add_time_sensor() | |
| def my_func(dt, context): | |
| assert context['logical_date'] == dt | |
| return dt + timedelta(0) | |
| op1 = ExternalTaskSensor( | |
| task_id='test_external_task_sensor_multiple_arg_fn', | |
| external_dag_id=TEST_DAG_ID, | |
| external_task_id=TEST_TASK_ID, | |
| execution_date_fn=my_func, | |
| allowed_states=['success'], | |
| dag=self.dag, | |
| ) | |
| op1.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) | |
| def test_external_task_sensor_fn_kwargs(self): | |
| """Check this task sensor passes multiple args with full context. If no failure, means clean run.""" | |
| self.add_time_sensor() | |
| def my_func(dt, ds_nodash, tomorrow_ds_nodash): | |
| assert ds_nodash == dt.strftime("%Y%m%d") | |
| assert tomorrow_ds_nodash == (dt + timedelta(days=1)).strftime("%Y%m%d") | |
| return dt + timedelta(0) | |
| op1 = ExternalTaskSensor( | |
| task_id='test_external_task_sensor_fn_kwargs', | |
| external_dag_id=TEST_DAG_ID, | |
| external_task_id=TEST_TASK_ID, | |
| execution_date_fn=my_func, | |
| allowed_states=['success'], | |
| dag=self.dag, | |
| ) | |
| op1.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) |
|
Good point @o-nikolas I was being lazy and didn't want to figure out how to run the test suite. You've pushed me to do the right thing and I think I've added a commit/test that addresses it and fails with the old code while passing with the new. Let me know what you think. |
Thanks for adding a test case! A committer will be required to run the workflow for a first time committer (and ultimately merge if everything passes and they are happy with the code). Unfortunately I am not a committer, I'll CC a few who may have time to have a look: @eladkal @potiuk @kaxil |
|
There was a change merged recently to standardize quoting in Airflow, you'll need to rebase this PR and run static checks locally to patch those up. |
|
@o-nikolas I merged in main enabled pre-commit and updated my quoting style so hopefully that's the last of the linting though we'll see for sure after PR checks run. Thanks for the help and heads up that it had changed. |
uranusjr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
|
It's fine but I think we need a newsfragment describing this change in the behaviours. This is borderline breaking change/bugfix and while I would lean more on the bugfix side, but people could have relied on it and it should be described in more detail in changelog. |
|
@potiuk because of this bug, to use the In those cases, fixing this bug will cause a change in the exception they receive from Does that sound about right? What would you propose we do? I'm happy to update a changelog if I'm pointed in the right direction? Also, there are some failing checks on this PR that I don't understand. Specifically, in the Sqlite Py3.7: API Always CLI Core Integration Other Providers WWW check a test fails that I'm pretty sure I don't go anywhere near: Any ideas on that front? The logs are long and I didn't see much useful in them while looking through so I wanted to ask before trying to dig deeper as I'm not super familiar with this code-base and the checks on it. |
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments. |
|
@eladkal I've added a newsfragment as suggested. For what it's worth, I think this should break very few if any people because most folks would want their DAGs to fail in the case of an upstream external task failing whether by timeout or because a task in a chain fails. I doubt there are many people who are trying to catch an exception and act on it in this case, but it's at least documented now that you might have to change the exception type in case anyone falls into that bucket. |
|
Can you please rebase/solve conflicts? |
c470ba2 to
96b0039
Compare
|
@potiuk sorry about that, I had been merging in Let me know if you need anything else on my end. |
|
@eladkal I noticed that the following test failed for the "Tests / Sqlite Py3.7: API Always CLI Core Integration Other Providers WWW (pull_request)" pre-submit job but I'm pretty sure this test has nothing to do with my code. Does that seem right? Is there anything I should be doing? Or do these tests fail intermittently or something? Thanks for the help! |
|
I reran the test. lets see what happens |
|
@eladkal that seems to have done the trick, thanks for the help! Anything else you need on my end to get this merged? |
|
Awesome work, congrats on your first merged pull request! |
At @o-nikolas request, I'm creating a new PR to attempt to fix #16204 where the ExternalTaskSensor would hang indefinitely when an execution_date_fn is used, failed_states/allowed_states are set, and external DAGs have mixed states upstream.