-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize airflow build process and switch to Hatchling build backend #36536
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR changes Airflow installation and build backend to use new standard Python ways of building Python applications. We've been trying to do it for quite a while. Airflow tranditionally has been using complex and convoluted build process based on setuptools and (extremely) custom setup.py file. It survived migration to Airflow 2.0 and splitting Airlfow monorepo into Airflow and Providers, adding pre-installed providers and switching providers to use flit (and follow build standards). So far tooling in Python ecosystme had not been able to fuflill our needs and we refrained to develop our own tooling, but finally with appearance of Hatch (managed by Python Packaging Authority) and few recent advancements there we are finally able to swtich to Python standard ways of managing project dependnecy configuration and project build setup (with a few customizations). This PR makes airflow build process follow those standard PEPs: * Airflow has all build configuration stored in pyproject.toml following PEP 518 which allows any fronted (`pip`, `poetry`, `hatch`, `flit`, or whatever other frontend is used to install required build dependendencies to install Airflow locally and to build distribution pacakges (sdist/wheel) * Hatchling backend follows PEP 517 for standard source tree and build backend implementation that allows to execute the build in a frontend-independent way * We store all project metadata in pyprooject.toml - following PEP 621 where all necessary project metadata components were defined. * We plug-in into Hatchling "editable build" hooks following PEP 660. Hatchling internally builds editable wheel that is used as ephemeral step and communication between backend and frontend (and this ephemeral wheel is used to make editable installation of the projeect - suitable for fast iteration of code without reinstalling the package. With Airflow having many provider packages in single source tree where we want to be able to install and develop airflow and providers together, this is not a small feat to implement the case wher editable installation has to behave quite a bit differently when it comes to packaging and dependencies for editable install (when you want to edit sources directly) and installable package (where you want to have separate Airflow package and provider packages). Fortunately the standardisation efforts in the Python Packaging community and tooling implementing it had finally made it possible. Some of the important ways bow this has been achieved: * Pyproject.toml is generally managed manually, but the part where provider dependencies and bundle dependencies are used is automatically updated by a pre-commit whenever provider dependencies change. * We have dedicated (generated) `[devel_provider_*]` extras that are only installing provider dependencies in editable mode (not the final provider packages). This allows to install dependencies of providers individually or in groups in the editable installation of Airflow, without installing provider packages (i.e. we can use provider code directly from sources of editable Airflow installation). * We have some generated `[devel_*]` bundle extras that bundle together all or selected provider dependencies for installation in CI image and local editable virtualenv installation. * We are utilising custom hatchiling build hooks (PEP 660 standard) that allow to modify 'standard' wheel package on-the-fly when the wheel is being prepared by adding preinstalled package dependencies (which are not needed in editable build) and by removing all devel extras (that are not needed in the PyPI distributed wheel package). This allows to solve the conundrum of having different "editable" and "standard" behaviour while keeping the same project specification in pyproject.toml. * We added description of how `Hatch` can be employed as build frontend in order to manage local virtualenv and install Airflow in editable way easily - while keeping all properties of the installed application (including working airflow cli and package metadata discovery) as well as how to use PEP-standard ways of bulding wheel and sdist packages. * We have a custom step (following PEP-standards) to inject airflow-specific build steps - compiling www assets and generating git commit hash version to display it in the UI * We also show how all this makes it possible to make it easy to manage local virtualenvs and editable installations for Airflow contributors - without vendor lock-in of the build tools as by following standard PEPs Airflow can be locally and editably installed by anyone using any build front-end tools following the standards - whether you use `pip`, `poetry`, `Hatch`, `flit` or any other frontent build tools, Airflow local installation and package building will work the same way for all of them, where both "editable" and "standard" package prepration is managed by `hatchling` backend in the same way. * Previously our extras contained a "." which is not normalized name for extras - `pip` and other tools replaced it automatically with `_'. This change updates the extra names to contain '_' rather than '.' in the name. This should be fully backwards compatible, users will still be able to use "." but it will be normalized to "_" in Airflow packages. * Some of the problematic extras (graphviz, docgen) have been moved out of the core extras to optional ones. Particularly graphviz has been difficult to install on MacOS ARM. This is slightly backwards incompatible, but we should treat it as bugfix - the only missing feature Airflow will not be able to handle is to produce DAG output as image (and it only requires to install graphviz to bring it back). The difficulty of installing graphviz as required dependency justifies the slight backwards-incompatible change. * Additionally, this change organizes the documentation around the extras and dependencies, explaining the reasoning behind all the different extras we have. * As a bonus (and this is what we used to test it all) we are documenting how to use Hatch frontend to: * manage multiple Python installations * manage multiple Pythob virtualenv environments * build Airflow packages for release management
Closed
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:API
Airflow's REST/HTTP API
area:dev-tools
area:production-image
Production image improvements and fixes
area:providers
provider:fab
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR changes Airflow installation and build backend to use new standard Python ways of building Python applications.
We've been trying to do it for quite a while. Airflow tranditionally has been using complex and convoluted build process based on setuptools and (extremely) custom setup.py file. It survived migration to Airflow 2.0 and splitting Airlfow monorepo into Airflow and Providers, adding pre-installed providers and switching providers to use flit (and follow build standards).
So far tooling in Python ecosystme had not been able to fuflill our needs and we refrained to develop our own tooling, but finally with appearance of Hatch (managed by Python Packaging Authority) and few recent advancements there we are finally able to swtich to Python standard ways of managing project dependnecy configuration and project build setup (with a few customizations).
This PR makes airflow build process follow those standard PEPs:
Airflow has all build configuration stored in pyproject.toml following PEP 518 which allows any fronted (
pip
,poetry
,hatch
,flit
, or whatever other frontend is used to install required build dependendencies to install Airflow locally and to build distribution pacakges (sdist/wheel)Hatchling backend follows PEP 517 for standard source tree and build backend implementation that allows to execute the build in a frontend-independent way
We store all project metadata in pyprooject.toml - following PEP 621 where all necessary project metadata components were defined.
We plug-in into Hatchling "editable build" hooks following PEP 660. Hatchling internally builds editable wheel that is used as ephemeral step and communication between backend and frontend (and this ephemeral wheel is used to make editable installation of the projeect - suitable for fast iteration of code without reinstalling the package.
With Airflow having many provider packages in single source tree where we want to be able to install and develop airflow and providers together, this is not a small feat to implement the case wher editable installation has to behave quite a bit differently when it comes to packaging and dependencies for editable install (when you want to edit sources directly) and installable package (where you want to have separate Airflow package and provider packages). Fortunately the standardisation efforts in the Python Packaging community and tooling implementing it had finally made it possible.
Some of the important ways bow this has been achieved:
Pyproject.toml is generally managed manually, but the part where provider dependencies and bundle dependencies are used is automatically updated by a pre-commit whenever provider dependencies change.
We have dedicated (generated)
[devel_provider_*]
extras that are only installing provider dependencies in editable mode (not the final provider packages). This allows to install dependencies of providers individually or in groups in the editable installation of Airflow, without installing provider packages (i.e. we can use provider code directly from sources of editable Airflow installation).We have some generated
[devel_*]
bundle extras that bundle together all or selected provider dependencies for installation in CI image and local editable virtualenv installation.We are utilising custom hatchiling build hooks (PEP 660 standard) that allow to modify 'standard' wheel package on-the-fly when the wheel is being prepared by adding preinstalled package dependencies (which are not needed in editable build) and by removing all devel extras (that are not needed in the PyPI distributed wheel package). This allows to solve the conundrum of having different "editable" and "standard" behaviour while keeping the same project specification in pyproject.toml.
We added description of how
Hatch
can be employed as build frontend in order to manage local virtualenv and install Airflow in editable way easily - while keeping all properties of the installed application (including working airflow cli and package metadata discovery) as well as how to use PEP-standard ways of bulding wheel and sdist packages.We have a custom step (following PEP-standards) to inject airflow-specific build steps - compiling www assets and generating git commit hash version to display it in the UI
We also show how all this makes it possible to make it easy to manage local virtualenvs and editable installations for Airflow contributors - without vendor lock-in of the build tools as by following standard PEPs Airflow can be locally and editably installed by anyone using any build front-end tools following the standards - whether you use
pip
,poetry
,Hatch
,flit
or any other frontent build tools, Airflow local installation and package building will work the same way for all of them, where both "editable" and "standard" package prepration is managed byhatchling
backend in the same way.Previously our extras contained a "." which is not normalized name for extras -
pip
and other tools replaced it automatically with `'. This change updates the extra names to contain '' rather than '.' in the name. This should be fully backwards compatible, users will still be able to use "." but it will be normalized to "_" in Airflow packages.Some of the problematic extras (graphviz, docgen) have been moved out of the core extras to optional ones. Particularly graphviz has been difficult to install on MacOS ARM. This is slightly backwards incompatible, but we should treat it as bugfix - the only missing feature Airflow will not be able to handle is to produce DAG output as image (and it only requires to install graphviz to bring it back). The difficulty of installing graphviz as required dependency justifies the slight backwards-incompatible change.
Additionally, this change organizes the documentation around the extras and dependencies, explaining the reasoning behind all the different extras we have.
As a bonus (and this is what we used to test it all) we are documenting how to use Hatch frontend to:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.