IntegrityError inserting into task_fail table with null execution_date from TI.handle_failure_with_callback #18943
Closed
2 tasks done
Labels
affected_version:2.2
Issues Reported for 2.2
area:core
kind:bug
This is a clearly a bug
priority:critical
Showstopper bug that should be patched immediately
Milestone
Apache Airflow version
2.2.0 (latest released)
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
Deployment
Other Docker-based deployment
Deployment details
AWS ECS, Celery Executor, Postgres 13, S3 Logging, Sentry integration
What happened
Noticed our Sentry getting a lot of integrity errors inserting into the task_fail table with a null execution date.
This seemed to be caused specifically by zombie task failures (We use AWS ECS Spot instances).
Specifically this callback from the dag file processor:
airflow/airflow/models/taskinstance.py
Line 1746 in e6c56c4
Adds a task_fail here:
airflow/airflow/models/taskinstance.py
Line 1705 in e6c56c4
This blows up when it flushes further down the method. This i believe is because when the task instance is refreshed from the database the
self.dag_run
property is not populated. The proxy fromti.execution_date
toti.dag_run.execution_date
then returnsNone
causing ourNOT NULL
violation.What you expected to happen
Insert into task_fail successfully and trigger callback
How to reproduce
Run this dag:
Kill the celery worker whilst its executing the long_running tasks. Wait for the zombie reaper of the scheduler to begin and call the failure handler.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: