You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kubernetes version (if you are using kubernetes) (use kubectl version):
Environment:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others: What happened:
#1514 added a verify_integrity function that greedily creates TaskInstance objects for all tasks in a dag.
This does not interact well with the assumptions in the new update_state function. The guard for if len(tis) == len(dag.active_tasks) is no longer effective; in the old world of lazily-created tasks this code would only run once all the tasks in the dag had run. Now it runs all the time, and as soon as one task in a dag run fails the whole DagRun fails. This is bad since the scheduler stops processing the DagRun after that.
In retrospect, the old code was also buggy: if your dag ends with a bunch of Queued tasks the DagRun could be marked as failed prematurely.
I suspect the fix is to update the guard to look at tasks where the state is success or failed. Otherwise we're evaluating and failing the dag based on up_for_retry/queued/scheduled tasks.
Apache Airflow version: 1.7.1.2
Kubernetes version (if you are using kubernetes) (use
kubectl version
):Environment:
uname -a
):What happened:
#1514 added a verify_integrity function that greedily creates TaskInstance objects for all tasks in a dag.
This does not interact well with the assumptions in the new update_state function. The guard for if len(tis) == len(dag.active_tasks) is no longer effective; in the old world of lazily-created tasks this code would only run once all the tasks in the dag had run. Now it runs all the time, and as soon as one task in a dag run fails the whole DagRun fails. This is bad since the scheduler stops processing the DagRun after that.
In retrospect, the old code was also buggy: if your dag ends with a bunch of Queued tasks the DagRun could be marked as failed prematurely.
I suspect the fix is to update the guard to look at tasks where the state is success or failed. Otherwise we're evaluating and failing the dag based on up_for_retry/queued/scheduled tasks.
What you expected to happen:
How to reproduce it:
Anything else we need to know:
Moved here from https://issues.apache.org/jira/browse/AIRFLOW-441
The text was updated successfully, but these errors were encountered: