-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Airflow 16934] fix delete task_instance also deleted dag_run too when sla #16960
Conversation
delete(ti) will also delete the dag_run(do by the sqla relationship)
Where is this relationship defined? The relationship between DagRun and TaskInstance is defined here: airflow/airflow/models/dagrun.py Lines 100 to 105 in d25854d
But I don’t see any cascading options set (the default is to not cascade, if I understand correctly). If this is really the cause, we can’t just not delete those TaskInstances so some other clean up would be needed. |
@uranusjr accoding the sqla document, default behavior is setting their foreign key reference to NULL. https://docs.sqlalchemy.org/en/14/orm/cascades.html#unitofwork-cascades also I have do a unit_test do session.delete(ti), it will set the dag_run's dag_id && execution_date to null(just like deleted the dag_run) anyway, I have use a another way to delete the taskinstance(it's a little ugly) |
Ah I see, that makes perfect sense. So both the original issue and this PR is using “delete” in an inaccurate sense; the dag run is not actually deleted, but incorrectly modified and becomes unavailable. I’m far from a SQLAlchemy expert, so I want to summon a fellow maintainer on this topic, but I feel the current relationship setup between DagRun and TaskInstance is backwards. Currently we have this on airflow/airflow/models/dagrun.py Lines 92 to 97 in c46e841
and this on airflow/airflow/models/taskinstance.py Line 2147 in c46e841
But if I’m understaing SQLAlchemy correctly (a big if), this makes |
agree with you the best solution is make relationship correct, but this is out of my knowledge with airflow whole project, maybe something else use the ti.dag_run
the sqla doc https://docs.sqlalchemy.org/en/14/orm/backref.html says backref is a alias with reletionship with back_populates both side, the right relationship is change backref to back_populates in DagRun airflow/airflow/models/taskinstance.py Line 2147 in c46e841
and this line just for pylint to know dag_run field in taskinstance (know from the comment: https://github.com/apache/airflow/blob/2.0.0/airflow/models/taskinstance.py#L2066) I just read the sqla document, also not familiar with it the right relationship also will fix another issue #16896 |
Yeah, the code in airflow/airflow/models/taskinstance.py Lines 2063 to 2073 in ab5f770
This was my bad -- I added the relationship in the first place, and I wasn't aware that the side where the relationship is defined matter. But yes, changing the relationship sounds like the better fix. @penggongkui Please also target all PRs directly to the |
how about assign this issue to someone who familiar with the SQLAlchemy and the whole airflow project, I may cannot do it with the big change |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
delete(ti) will also delete the dag_run(do by the sqla relationship)
#16934
It's not sla check duty to delete the missed task_instance