Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Jul 10, 2025

And move airflow.utils.timezone into a shared library as the first example of it working.

In this change we have now setled on an approach using symlinks, but we did
explore other options (see the GH PR for discussion and previous versions,
notably one built upon the vendoring tool)

A lot of the reasoning and mode of operation of this is detailed in shared/README.md in this PR, hence why this description is so short.

Currently various places in TaskSDK and Airflow Core both use these utility functions, and while in this specific case they are small enough that they could just be copied and the duplication wouldn't hurt us long term, this changes shows a way in which we can have a single source of truth, but have it included automatically in built dists.

(For posterity, please see b6ae6a9 for the previous vendoring-based approach)

And thanks to Jarek for the help in making the symlink version work, it's much simpler!

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:DAG-processing area:db-migrations PRs with DB migration area:dev-tools area:Scheduler including HA (high availability) scheduler area:Triggerer backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch kind:documentation labels Jul 10, 2025
@ashb ashb force-pushed the shared-vendored-lib-tasksdk-and-core branch from 6fd5e04 to e2376cd Compare July 10, 2025 16:38
@ashb ashb force-pushed the shared-vendored-lib-tasksdk-and-core branch 2 times, most recently from 4cb4080 to fbd6e5f Compare July 11, 2025 17:36
@ashb ashb changed the title Set up process for sharing/vendoring code between different components. 🚧 Set up process for sharing/vendoring code between different components. Jul 11, 2025
@ashb ashb marked this pull request as ready for review July 11, 2025 17:37
@potiuk
Copy link
Member

potiuk commented Jul 23, 2025

Plus do we need a pre-commit check to ensure the symlink and projet setup is done correctly?

Yes. We agreed I will write it as follow-up @uranusjr

@potiuk
Copy link
Member

potiuk commented Jul 23, 2025

A general miss imo. For public docs etc we should use the public API path: airflow.sdk and internally use the canonical imports: from airflow.sdk._shared.timezones import timezone

I think that should be ruff rule. Everything "in" task-sdk should use "_shared" or ("relative" if inside _shared already). For evryhing "using" "task.sdk" it should use the facade. The question is can we somehow forbid (or discourage) importing from `_shared" from outside ?

@potiuk
Copy link
Member

potiuk commented Jul 23, 2025

Have we discussed backward compatibility (in general)? Like should airflow.utils.timezone should still be accessible by users.

+1 -> See the other comment from @amoghrajesh , but I think task-sdk should have airflow.utils with deprecation redirections to it's _shared code - this is the only distribution that should be accessed by user code I think - i.e. where the deprecations might be needed.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! It's great we could combine all the ideas and get something really nice and developer-friendly :) . There are few comments - most of them require more a of discussion, agreement and follow-up than changing the PR.

@ashb
Copy link
Member Author

ashb commented Jul 23, 2025

Cases like this in the providers:

from airflow.utils.timezone import convert_to_utc

I think I will leave as is for now, in this PR and do a single provider-only follow up PR that does the right thing (try sdk.timezone, except import utils.timezone)

@ashb ashb force-pushed the shared-vendored-lib-tasksdk-and-core branch 3 times, most recently from 4b26389 to 80816e3 Compare July 23, 2025 12:01
@ashb
Copy link
Member Author

ashb commented Jul 23, 2025

+1 -> See the other comment from @amoghrajesh , but I think task-sdk should have airflow.utils with deprecation redirections to it's _shared code - this is the only distribution that should be accessed by user code I think - i.e. where the deprecations might be needed.

Yeah longer term (as in, not in this PR) I agree that is the right place, but we can't do that until we remove everything else form airflow.utils out of the airflow-core dist I don't think?

@potiuk
Copy link
Member

potiuk commented Jul 23, 2025

Yeah longer term (as in, not in this PR) I agree that is the right place, but we can't do that until we remove everything else form airflow.utils out of the airflow-core dist I don't think?

We could potentially do the legacy namespace pathlib trick in both utils packages - that will nicely work and allow to move stuff between the two over time.

@potiuk
Copy link
Member

potiuk commented Jul 23, 2025

Yeah longer term (as in, not in this PR) I agree that is the right place, but we can't do that until we remove everything else form airflow.utils out of the airflow-core dist I don't think?

We could potentially do the legacy namespace pathlib trick in both utils packages - that will nicely work and allow to move stuff between the two over time.

Even if temporarily - that would likely work nicely - we could remove it before release.

@ashb
Copy link
Member Author

ashb commented Jul 23, 2025

(╯°□°)╯︵ ┻━┻

@ashb ashb force-pushed the shared-vendored-lib-tasksdk-and-core branch from e0d811c to bed3bd8 Compare July 23, 2025 20:03
And move `airflow.utils.timezone` into a shared library as the first example
of it working.

In this change we have now setled on an approach using symlinks, but we did
explore other options (see the GH PR for discussion and previous versions,
notably one built upon the `vendoring` tool)

A lot of the reasoning and mode of operation of this is detailed in
shared/README.md in this PR, hence why this description is so short.

Currently various places in TaskSDK and Airflow Core both use these utility
functions, and while in this specific case they are small enough that they
could just be copied and the duplication wouldn't hurt us long term, this
changes shows a way in which we can have a single source of truth, but have it
included automatically in built dists.

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
@ashb ashb force-pushed the shared-vendored-lib-tasksdk-and-core branch from bed3bd8 to 636f97f Compare July 23, 2025 20:19
@ashb ashb merged commit 5f07ae1 into apache:main Jul 23, 2025
176 checks passed
@ashb ashb deleted the shared-vendored-lib-tasksdk-and-core branch July 23, 2025 21:06
@amoghrajesh
Copy link
Contributor

amoghrajesh commented Jul 24, 2025

#protm

Lot of good collaboration between various Airflow wizards and impact wise this one unlocks ability for client server separation.

@potiuk
Copy link
Member

potiuk commented Jul 24, 2025

Let me get the pre-commit now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants