-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Extract all the various DB update calls and processes for Dag Parsing to one place #44898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ashb
merged 4 commits into
apache:main
from
astronomer:extract-dag-db-update-to-one-place
Dec 13, 2024
Merged
Extract all the various DB update calls and processes for Dag Parsing to one place #44898
ashb
merged 4 commits into
apache:main
from
astronomer:extract-dag-db-update-to-one-place
Dec 13, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ashb
reviewed
Dec 13, 2024
ashb
reviewed
Dec 13, 2024
ashb
reviewed
Dec 13, 2024
ashb
reviewed
Dec 13, 2024
ashb
reviewed
Dec 13, 2024
ashb
reviewed
Dec 13, 2024
ashb
reviewed
Dec 13, 2024
… to one place The updates to the various DB tables as a result of DAG file parsing and collection are scattered throughout the code base. This change gathers them all in to one place to make it easier to a) find, and b) see what is changed. The current situation is the result of organic code growth, but it is now almost impossible to find all the things we update as they are not organized, and were scatterd across at least DagBag, DagProcessor. Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
1a8eeaf to
d341351
Compare
Member
|
Re-triggering with full tests |
This is a property of dagbag parsing, not the processor. Also convert it to a parameterizes test too
ashb
reviewed
Dec 13, 2024
kaxil
reviewed
Dec 13, 2024
kaxil
approved these changes
Dec 13, 2024
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 17, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 17, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 18, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 18, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 18, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
to astronomer/airflow
that referenced
this pull request
Dec 19, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMIANTE and END messages over the control socket can go. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb
added a commit
that referenced
this pull request
Dec 19, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in #44898 and the "subprocess" machinery introduced in #44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in #44839 the need for separate TERMINATE and END messages over the control socket can go. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
got686-yandex
pushed a commit
to got686-yandex/airflow
that referenced
this pull request
Jan 30, 2025
… to one place (apache#44898) The updates to the various DB tables as a result of DAG file parsing and collection are scattered throughout the code base. This change gathers them all in to one place to make it easier to a) find, and b) see what is changed. The current situation is the result of organic code growth, but it is now almost impossible to find all the things we update as they are not organized, and were scatterd across at least DagBag, DagProcessor. Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com> * Move ParseImportError formatting/depth tests to dagbag This is a property of dagbag parsing, not the processor. Also convert it to a parameterized test too Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
got686-yandex
pushed a commit
to got686-yandex/airflow
that referenced
this pull request
Jan 30, 2025
As part of Airflow 3 DAG definition files will have to use the Task SDK for all their classes, and anything involving running user code will need to be de-coupled from the database in the user-code process. This change moves all of the "serialization" change up to the DagFileProcessorManager, using the new function introduced in apache#44898 and the "subprocess" machinery introduced in apache#44874. **Important Note**: this change does not remove the ability for dag processes to access the DB for Variables etc. That will come in a future change. Some key parts of this change: - It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK. This will be addressed before release (we have talked about introducing a new "apache-airflow-base-executor" dist where this subprocess+supervisor could live, as the "execution_time" folder in the Task SDK is more a feature of the executor, not of the TaskSDK itself.) - A number of classes that we need to send between processes have been converted to Pydantic for ease of serialization. - In order to not have to serialize everything in the subprocess and deserialize everything in the parent Manager process, we have created a `LazyDeserializedDAG` class that provides lazy access to much of the properties needed to create update the DAG related DB objects, without needing to fully deserialize the entire DAG structure. - Classes switched to attrs based for less boilerplate in constructors. - Internal timers convert to `time.monotonic` where possible, and `time.time` where not, we only need second diff between two points, not datetime objects. - With the earlier removal of "sync mode" for SQLite in apache#44839 the need for separate TERMINATE and END messages over the control socket can go. --------- Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:Scheduler
including HA (high availability) scheduler
full tests needed
We need to run full set of tests for this PR to merge
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The updates to the various DB tables as a result of DAG file parsing and collection are scattered throughout the code base.
This change gathers them all in to one place to make it easier to a) find, and b) see what is changed.
The current situation is the result of organic code growth, but it is now almost impossible to find all the things we update as they are not organized, and were scatterd across at least DagBag, DagProcessor.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.