Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Airflow 16934] fix delete task_instance also deleted dag_run too when sla #16960

Closed
wants to merge 2 commits into from

Conversation

penggongkui
Copy link
Contributor

delete(ti) will also delete the dag_run(do by the sqla relationship)

#16934

It's not sla check duty to delete the missed task_instance

delete(ti) will also delete the dag_run(do by the sqla relationship)
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jul 13, 2021
@uranusjr
Copy link
Member

uranusjr commented Jul 14, 2021

delete(ti) will also delete the dag_run(do by the sqla relationship)

Where is this relationship defined? The relationship between DagRun and TaskInstance is defined here:

task_instances = relationship(
TI,
primaryjoin=and_(TI.dag_id == dag_id, TI.execution_date == execution_date), # type: ignore
foreign_keys=(dag_id, execution_date),
backref=backref('dag_run', uselist=False),
)

But I don’t see any cascading options set (the default is to not cascade, if I understand correctly).

If this is really the cause, we can’t just not delete those TaskInstances so some other clean up would be needed.

@penggongkui
Copy link
Contributor Author

@uranusjr accoding the sqla document, default behavior is setting their foreign key reference to NULL.

https://docs.sqlalchemy.org/en/14/orm/cascades.html#unitofwork-cascades

image

also I have do a unit_test do session.delete(ti), it will set the dag_run's dag_id && execution_date to null(just like deleted the dag_run)

anyway, I have use a another way to delete the taskinstance(it's a little ugly)

@penggongkui penggongkui changed the title remove delete task_instance when sla [Airflow 16934] fix delete task_instance also deleted dag_run too when sla Jul 14, 2021
@uranusjr
Copy link
Member

it will set the dag_run's dag_id && execution_date to null (just like deleted the dag_run)

Ah I see, that makes perfect sense. So both the original issue and this PR is using “delete” in an inaccurate sense; the dag run is not actually deleted, but incorrectly modified and becomes unavailable.

I’m far from a SQLAlchemy expert, so I want to summon a fellow maintainer on this topic, but I feel the current relationship setup between DagRun and TaskInstance is backwards.

Currently we have this on DagRun:

task_instances = relationship(
TI,
primaryjoin=and_(TI.dag_id == dag_id, TI.execution_date == execution_date), # type: ignore
foreign_keys=(dag_id, execution_date),
backref=backref('dag_run', uselist=False),
)

and this on TaskInstance:

TaskInstance.dag_run = relationship(DagRun)

But if I’m understaing SQLAlchemy correctly (a big if), this makes DagRun a child of TaskInstance, thus deleting a TaskInstance triggers cascading on its associated DagRun, setting the foreign key columns to NULL. But the hierarchy is the other way around: a TaskInstance only makes sense when there is an associated DagRun, so deleting a DagRun should cascade to its TaskInstance children, but deleting TaskInstance shouldn’t affect a DagRun. So we should set the backref at the other side (on TaskInstance.dag_run) instead…?

@penggongkui
Copy link
Contributor Author

agree with you

the best solution is make relationship correct, but this is out of my knowledge with airflow whole project, maybe something else use the ti.dag_run

So we should set the backref at the other side (on TaskInstance.dag_run) instead…?

the sqla doc https://docs.sqlalchemy.org/en/14/orm/backref.html says backref is a alias with reletionship with back_populates both side, the right relationship is change backref to back_populates in DagRun

TaskInstance.dag_run = relationship(DagRun)

and this line just for pylint to know dag_run field in taskinstance (know from the comment: https://github.com/apache/airflow/blob/2.0.0/airflow/models/taskinstance.py#L2066)

I just read the sqla document, also not familiar with it

the right relationship also will fix another issue #16896

@ashb
Copy link
Member

ashb commented Jul 14, 2021

Yeah, the code in

STATICA_HACK = True
globals()['kcah_acitats'[::-1].upper()] = False
if STATICA_HACK: # pragma: no cover
# Let pylint know about these relationships, without introducing an import cycle
from sqlalchemy.orm import relationship
from airflow.job.base_job import BaseJob
from airflow.models.dagrun import DagRun
TaskInstance.dag_run = relationship(DagRun)
TaskInstance.queued_by_job = relationship(BaseJob)
is never run by python -- it's only picked up by code analysers.

This was my bad -- I added the relationship in the first place, and I wasn't aware that the side where the relationship is defined matter.

But yes, changing the relationship sounds like the better fix.

@penggongkui Please also target all PRs directly to the main branch, and then once it's merged then we (committers/release managers) will cherry-pick it in to the release branch.

@penggongkui
Copy link
Contributor Author

how about assign this issue to someone who familiar with the SQLAlchemy and the whole airflow project, I may cannot do it with the big change

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 29, 2021
@github-actions github-actions bot closed this Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants