-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix task removal when there's serdag change in versioned bundle #48956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a24699c to
0a4227a
Compare
|
Works correctly now: versioning-2.mov |
Ensure DAG runs in a versioned bundle use the correct serialized DAG (serdag) version. Previously, when a DAG had multiple versions due to changes (e.g., a task removed), running DAG runs would sometimes incorrectly use the latest serialized DAG, even if they were started with an older version. This caused tasks that should still run to be incorrectly marked as removed. The issue was caused by the scheduler not considering `bundle_version` when retrieving the DAG from the `dagbag`, defaulting to the latest available version. This fix updates `dagbag.dags` to store DAGs keyed by both `dag_id` and `bundle_version`. All DAG retrievals in scheduling are now made with the correct version context, ensuring task consistency for running DAG runs. Also introduces a `GetDag` Protocol for precise typing of the cached DAG getter, now supporting both `dag_id` and `bundle_version` parameters.
0a4227a to
c412057
Compare
| id=id, | ||
| path=dag_file.absolute_path, | ||
| bundle_path=cast("Path", dag_file.bundle_path), | ||
| bundle_version=dag_file.bundle_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of this? I thought that with dag file processing, we are always tracking the latest version and we don't specify it here intentionally -- and i'm not even sure where this argument goes to in the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed so that we specifically store dag_id and the version in dagbag.dags.
If you have two dagruns running, with different number of task instances, you will notice that when the Scheduler does dag_run.update_state, some task will be marked removed because the retrieved dag from dagbag has fewer tasks
|
This is a rather large PR @ephraimbuddy that, by the looks of it, has implications beyond the specific case that you're solving, i.e. running the right dag version. Can you possibly split this up so that we can just see the fix that is specific to this problem? |
I don't think the PR can be split up. The major fix here is having DagBag.dags keys to be (dag_id, bundle_version) instead of dag_id. DagBag is used in all the places I updated. We can't use the tuple key without updating all the other places. I'm going to reply to the other comments but this is the fix. You can also try the reproduction and see if there's another way to fix this issue |
I see, you changed the type in DagBag.dags, that's why the change above looked wrong |
|
closing in favor of #49097 |
Ensure DAG runs in a versioned bundle use the correct serialized DAG (serdag) version.
Previously, when a DAG had multiple versions due to changes (e.g., a task removed), running DAG runs would sometimes incorrectly use the latest serialized DAG, even if they were started with an older version. This caused tasks that should still run to be incorrectly marked as removed.
The issue was caused by the scheduler not considering
bundle_versionwhen retrieving the DAG from thedagbag, defaulting to the latest available version.This fix updates
dagbag.dagsto store DAGs keyed by bothdag_idandbundle_version. All DAG retrievals in scheduling are now made with the correct version context, ensuring task consistency for running DAG runs.Also introduces a
GetDagProtocol for precise typing of the cached DAG getter, now supporting bothdag_idandbundle_versionparameters.How to reproduce:
Using the dag below in a git bundle, create a run, then comment out the first section, uncomment the second quickly while the first is still in the sleep task. Run the task and you will see the astronomer task marked removed.
Closes: #49007