-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Add Ruff rules to ban Core imports from Task SDK internals #53627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pyproject.toml
Outdated
| "providers/common/sql/src/airflow/providers/common/sql/hooks/sql.pyi" = ["TID253"] | ||
|
|
||
| # Allow Task SDK and its tests to import from task SDK | ||
| "task-sdk/**" = ["TID251"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is overly broad -- it ignores all banned imports.
Given the number of ignores we need (and the fact that ignoring the rule means we ignore all banned imports, not just the one we want to ignore), perhaps we should instead put these rules in the subfolders pyproject.toml instead of the global workspace-wide one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should instead put these rules in the subfolders pyproject.toml instead of the global workspace-wide one?
yeah, agreed. Check https://docs.astral.sh/ruff/configuration/#config-file-discovery @amoghrajesh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled in c2be655
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providers can be kept on top-level pyproject.toml -- no need to go that deep IMO.
If you'd rather still do it for all providers, you can do it in https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/templates/pyproject_TEMPLATE.toml.jinja2
It is small enough change, let's not punt it for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example
cd providers/sftp
uv run ruffShould fail if there is an import from airflow-core in sftp providers when airflow-core is not declared as dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that check will only be fine with ty: astral-sh/ruff#9103
Maybe we can just parse all files in the distribution after uv sync and see if they are properly parseable ? We do that already for provider yaml validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@potiuk little confused on the ideas here. Are you suggesting that we do NOT handle it using banned-api but create a precommit rule that can run uv run ruff etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the banned api will achieve similar results during precommit phase itself, what's missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. I am suggesting we do both. They serve a bit different purposes and are triggered in a different moment.
I was deliberating, whether there is a rule in ruff that we could use to expand it and generally fail when there is any import that is not present at all in the venv (but this is something that - probably correctly) astral team is going to have in ty.
One big (and important) difference of ruff static checking vs. mypy or ty static (type) checking - is that ruff is able to just look at the sources and figure out if things are right or not - this is for example why we can run ruff in pre-commit outside of breeze in local venv, because it generally does not need to install all the 700+ dependencies to do it's job.
On the other hand checking if "import" is missing, requires building locally the virtualenv that should have all the dependnecies of things that we check installed - for example if you want to check google provider, you need to have all google provider dependencies installed and their dependencies (for example plyvel that is inherently difficult to install on MacOS for example, or pymsql for mysql provider and it requires to have mysql system-level libraries installed). This is why current mypy and future ty (if we switch to it) will necessarliy have to be run inside the image or after runnig uv sync --all-packages in local env (which is not guaranteed to work outside of the breeze image and relies on a number of 3rd-party system libraries we need to have installed and build tools installed in the system)
So I don't think anything is missing, and think just doing what we already doing in "lowest dependency" tests is enough (for now):
cd <distribution> # (for example `providers/amazon`)
uv sync --resolution lowest-direct
pytest tests
This effectively does what I wanted to do "implicitly" - while runnig all the test, we assume that all the src files will be imported as part of pytest tests -> and they will fail if we import from airflow (because eventually "task-sdk" will not depende on "airflow-core" and uv sync will simply remove airflow-core imports from PYTHON_PATH.
But later with ty we can just scan all the sources in similar fashion:
cd <distribution> # (for example `providers/amazon`)
uv sync --resolution lowest-direct
ty src tests
And there - such import checks will be done automatically for all sources.
So summarizing - no change now - this was a bit of a digression and future work.
7870c50 to
c2be655
Compare
|
Based on comment: #53627 (comment), its not nice to ignore all TID251 rules for the files i've added in per-file-ignores. So, i am actually thinking if this is the right way ahead at all. |
|
Yeah i didnt like the limitations this one added, so wrote a precommit that can do it better: #53700 |
Yeah. Ruff rules are sometimes too broad and better to have specialized check - but I proposed to use |
Done! |

closes: #53588
Problem
Airflow core currently has extensive coupling with task SDK modules, creating dependencies between task sdk and core, which makes client server separation a challenge.
Current State
Solution / Prevention of more such imports
Implement Ruff banned-api (https://docs.astral.sh/ruff/rules/banned-api/) rules to enforce architectural boundaries:
What's Been Done
airflow-core/pyproject.tomlruff check --select TID251What Needs To Be Done 🚧
providers/amazon/pyproject.toml^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.