Skip to content

Conversation

@dimberman
Copy link
Contributor

@dimberman dimberman commented Dec 12, 2024

The updates to the various DB tables as a result of DAG file parsing and collection are scattered throughout the code base.

This change gathers them all in to one place to make it easier to a) find, and b) see what is changed.

The current situation is the result of organic code growth, but it is now almost impossible to find all the things we update as they are not organized, and were scatterd across at least DagBag, DagProcessor.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Dec 12, 2024
@ashb ashb changed the title Extract all the various DB update calls and processes for Dag Parsing… Extract all the various DB update calls and processes for Dag Parsing to one place Dec 13, 2024
@ashb ashb marked this pull request as ready for review December 13, 2024 11:07
… to one place

The updates to the various DB tables as a result of DAG file parsing and
collection are scattered throughout the code base.

This change gathers them all in to one place to make it easier to a) find, and
b) see what is changed.

The current situation is the result of organic code growth, but it is now
almost impossible to find all the things we update as they are not organized,
and were scatterd across at least DagBag, DagProcessor.

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
@ashb ashb force-pushed the extract-dag-db-update-to-one-place branch from 1a8eeaf to d341351 Compare December 13, 2024 11:26
@ashb ashb added the full tests needed We need to run full set of tests for this PR to merge label Dec 13, 2024
@ashb
Copy link
Member

ashb commented Dec 13, 2024

Re-triggering with full tests

@ashb ashb closed this Dec 13, 2024
@ashb ashb reopened this Dec 13, 2024
This is a property of dagbag parsing, not the processor.

Also convert it to a parameterizes test too
@ashb ashb merged commit ec893fd into apache:main Dec 13, 2024
97 checks passed
@ashb ashb deleted the extract-dag-db-update-to-one-place branch December 13, 2024 15:51
ashb added a commit to astronomer/airflow that referenced this pull request Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime objects
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime objects
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime objects
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 16, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 17, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 17, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 18, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 18, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 18, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit to astronomer/airflow that referenced this pull request Dec 19, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMIANTE and END messages over the control socket can go.

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
ashb added a commit that referenced this pull request Dec 19, 2024
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in #44898 and the
"subprocess" machinery introduced in #44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in #44839 the need for
  separate TERMINATE and END messages over the control socket can go.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
… to one place (apache#44898)

The updates to the various DB tables as a result of DAG file parsing and
collection are scattered throughout the code base.

This change gathers them all in to one place to make it easier to a) find, and
b) see what is changed.

The current situation is the result of organic code growth, but it is now
almost impossible to find all the things we update as they are not organized,
and were scatterd across at least DagBag, DagProcessor.

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>

* Move ParseImportError formatting/depth tests to dagbag

This is a property of dagbag parsing, not the processor.

Also convert it to a parameterized test too

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
As part of Airflow 3 DAG definition files will have to use the Task SDK for
all their classes, and anything involving running user code will need to be
de-coupled from the database in the user-code process.

This change moves all of the "serialization" change up to the
DagFileProcessorManager, using the new function introduced in apache#44898 and the
"subprocess" machinery introduced in apache#44874.

**Important Note**: this change does not remove the ability for dag processes
to access the DB for Variables etc. That will come in a future change.

Some key parts of this change:

- It builds upon the WatchedSubprocess from the TaskSDK. Right now this puts a
  nasty/unwanted depenednecy between the Dag Parsing code upon the TaskSDK.
  This will be addressed before release (we have talked about introducing a
  new "apache-airflow-base-executor" dist where this subprocess+supervisor
  could live, as the "execution_time" folder in the Task SDK is more a feature
  of the executor, not of the TaskSDK itself.)
- A number of classes that we need to send between processes have been
  converted to Pydantic for ease of serialization.
- In order to not have to serialize everything in the subprocess and deserialize everything
  in the parent Manager process, we have created a `LazyDeserializedDAG` class
  that provides lazy access to much of the properties needed to create update
  the DAG related DB objects, without needing to fully deserialize the entire
  DAG structure.
- Classes switched to attrs based for less boilerplate in constructors.
- Internal timers convert to `time.monotonic` where possible, and `time.time`
  where not, we only need second diff between two points, not datetime
  objects.
- With the earlier removal of "sync mode" for SQLite in apache#44839 the need for
  separate TERMINATE and END messages over the control socket can go.

---------

Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants