Skip to content

Conversation

@amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Jul 22, 2025

closes: #53588

Problem

Airflow core currently has extensive coupling with task SDK modules, creating dependencies between task sdk and core, which makes client server separation a challenge.

Current State

  • Core modules import directly from airflow.sdk.*
  • This creates tight coupling between Core scheduler/API and SDK internals
  • Makes it difficult to upgrade SDK independently
  • Violates the intended architecture where Core should not depend on SDK internals

Solution / Prevention of more such imports

Implement Ruff banned-api (https://docs.astral.sh/ruff/rules/banned-api/) rules to enforce architectural boundaries:

TID251 is a Ruff linting rule code that gets triggered when code violates the banned-api rules we set up.

What's Been Done

  • Banned-API Rules Implemented in airflow-core/pyproject.toml
  • Temporary Exemptions Added
  • Architectural Boundary Enforced
    • ✅ New violations prevented - any new SDK imports in core will be flagged
    • ✅ Clear violation count - 94 files identified that need refactoring, will hunt down with the help of community (checked using ruff check --select TID251

What Needs To Be Done 🚧

  • Distribute rules to where they belong
  • Example, we shouldn't be exempting task SDK imports in amazon provider in here, move those rules to providers/amazon/pyproject.toml
  • Begin systematic refactoring of Core files to remove SDK dependencies (maybe can be done with ruff?. Dont know yet)

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

pyproject.toml Outdated
"providers/common/sql/src/airflow/providers/common/sql/hooks/sql.pyi" = ["TID253"]

# Allow Task SDK and its tests to import from task SDK
"task-sdk/**" = ["TID251"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly broad -- it ignores all banned imports.

Given the number of ignores we need (and the fact that ignoring the rule means we ignore all banned imports, not just the one we want to ignore), perhaps we should instead put these rules in the subfolders pyproject.toml instead of the global workspace-wide one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should instead put these rules in the subfolders pyproject.toml instead of the global workspace-wide one?

yeah, agreed. Check https://docs.astral.sh/ruff/configuration/#config-file-discovery @amoghrajesh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added a note here in the description
image

But for task-sdk let me move it in the same PR, for providers etc al we can move in follow ups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in c2be655

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providers can be kept on top-level pyproject.toml -- no need to go that deep IMO.

If you'd rather still do it for all providers, you can do it in https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/templates/pyproject_TEMPLATE.toml.jinja2

It is small enough change, let's not punt it for later.

Copy link
Member

@potiuk potiuk Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example

cd providers/sftp
uv run ruff

Should fail if there is an import from airflow-core in sftp providers when airflow-core is not declared as dependency

Copy link
Member

@potiuk potiuk Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that check will only be fine with ty: astral-sh/ruff#9103

Maybe we can just parse all files in the distribution after uv sync and see if they are properly parseable ? We do that already for provider yaml validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk little confused on the ideas here. Are you suggesting that we do NOT handle it using banned-api but create a precommit rule that can run uv run ruff etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the banned api will achieve similar results during precommit phase itself, what's missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. I am suggesting we do both. They serve a bit different purposes and are triggered in a different moment.

I was deliberating, whether there is a rule in ruff that we could use to expand it and generally fail when there is any import that is not present at all in the venv (but this is something that - probably correctly) astral team is going to have in ty.

One big (and important) difference of ruff static checking vs. mypy or ty static (type) checking - is that ruff is able to just look at the sources and figure out if things are right or not - this is for example why we can run ruff in pre-commit outside of breeze in local venv, because it generally does not need to install all the 700+ dependencies to do it's job.

On the other hand checking if "import" is missing, requires building locally the virtualenv that should have all the dependnecies of things that we check installed - for example if you want to check google provider, you need to have all google provider dependencies installed and their dependencies (for example plyvel that is inherently difficult to install on MacOS for example, or pymsql for mysql provider and it requires to have mysql system-level libraries installed). This is why current mypy and future ty (if we switch to it) will necessarliy have to be run inside the image or after runnig uv sync --all-packages in local env (which is not guaranteed to work outside of the breeze image and relies on a number of 3rd-party system libraries we need to have installed and build tools installed in the system)

So I don't think anything is missing, and think just doing what we already doing in "lowest dependency" tests is enough (for now):

cd <distribution> # (for example `providers/amazon`)
uv sync --resolution lowest-direct
pytest tests

This effectively does what I wanted to do "implicitly" - while runnig all the test, we assume that all the src files will be imported as part of pytest tests -> and they will fail if we import from airflow (because eventually "task-sdk" will not depende on "airflow-core" and uv sync will simply remove airflow-core imports from PYTHON_PATH.

But later with ty we can just scan all the sources in similar fashion:

cd <distribution> # (for example `providers/amazon`)
uv sync --resolution lowest-direct
ty src tests

And there - such import checks will be done automatically for all sources.

So summarizing - no change now - this was a bit of a digression and future work.

@amoghrajesh amoghrajesh force-pushed the ruff-rule-for-td251 branch from 7870c50 to c2be655 Compare July 22, 2025 12:02
@amoghrajesh amoghrajesh requested review from ashb, kaxil and potiuk July 22, 2025 12:36
@amoghrajesh
Copy link
Contributor Author

Based on comment: #53627 (comment), its not nice to ignore all TID251 rules for the files i've added in per-file-ignores.

So, i am actually thinking if this is the right way ahead at all.

@amoghrajesh
Copy link
Contributor Author

Yeah i didnt like the limitations this one added, so wrote a precommit that can do it better: #53700

@potiuk
Copy link
Member

potiuk commented Jul 24, 2025

Yeah i didnt like the limitations this one added, so wrote a precommit that can do it better: #53700

Yeah. Ruff rules are sometimes too broad and better to have specialized check - but I proposed to use ast rather than finding strings and matching regexps. It can do the job in a more robust way.

@amoghrajesh
Copy link
Contributor Author

Yeah i didnt like the limitations this one added, so wrote a precommit that can do it better: #53700

Yeah. Ruff rules are sometimes too broad and better to have specialized check - but I proposed to use ast rather than finding strings and matching regexps. It can do the job in a more robust way.

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Ruff rules to ban Core imports from Task SDK internals

4 participants