Skip to content

Conversation

@amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Jan 17, 2026


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Cursor IDE] following the guidelines

As we are approaching client server separation soon, we want to prevent working on a moving target. ie: if contributors continue adding airflow-core imports in sdk, it will make our lives harder trying to remove them. Adding a prevention layer for it through a prek hook. The excluded files contain such imports and need to be refactored to remove them


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:dev-tools area:task-sdk backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch labels Jan 17, 2026
@amoghrajesh amoghrajesh self-assigned this Jan 17, 2026
@amoghrajesh amoghrajesh added this to the Airflow 3.2.0 milestone Jan 17, 2026
@amoghrajesh amoghrajesh removed the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Jan 17, 2026
@amoghrajesh amoghrajesh requested a review from sunank200 January 17, 2026 04:37
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits

This also won't catch function/method scoped imports. Should it be looking there too?

At one point ruff had an anayze or graph subcommand that nought give us the imports without needing to parse the full ast ourselves. I think another ci script uses that too. Might be worth doing it the same?

@amoghrajesh
Copy link
Contributor Author

@ashb thanks for your review, I handled the import airflow.x type imports. As for method scoped imports, those will be caught by the script because we use ast.walk(tree) to walk through all nodes of the tree: https://docs.python.org/3/library/ast.html#ast.walk

Ran it by an example:

(apache-airflow) ➜  airflow git:(prek-hook-to-check-core-imports) ✗ cat /tmp/examples.py                                     
from airflow.sdk import Connection

def my_function():
    from airflow.models import DagRun
    import airflow.triggers.base
    
    from airflow.sdk.types import RuntimeTaskInstance
    
def another_function():
    if True:
        from airflow.serialization import serialize
        
class MyClass:
    def method(self):
        import airflow.dag_processing
(apache-airflow) ➜  airflow git:(prek-hook-to-check-core-imports) ✗ python scripts/ci/prek/check_core_imports_in_sdk.py /tmp/examples.py
/tmp/examples.py:
  Line 4: from airflow.models import DagRun
  Line 5: import airflow.triggers.base
  Line 11: from airflow.serialization import serialize
  Line 15: import airflow.dag_processing

Found 4 core import(s) in task-sdk files

@ashb
Copy link
Member

ashb commented Jan 17, 2026

Ah cool

@amoghrajesh amoghrajesh requested a review from ashb January 17, 2026 11:26
@amoghrajesh
Copy link
Contributor Author

At one point ruff had an anayze or graph subcommand that nought give us the imports without needing to parse the full ast ourselves. I think another ci script uses that too. Might be worth doing it the same?

Not too sure about that but we already have prek hooks that follow pattern like the one in this PR, I could explore if needed but the result would be the same 😅

@ashb
Copy link
Member

ashb commented Jan 17, 2026

It might be worth adding explicit line based ignores rather than whole files. Lgtm as it is though

@amoghrajesh
Copy link
Contributor Author

Let me consider that too, this one should be good as an immediate prevention

@potiuk
Copy link
Member

potiuk commented Jan 17, 2026

Few nits

This also won't catch function/method scoped imports. Should it be looking there too?

At one point ruff had an anayze or graph subcommand that nought give us the imports without needing to parse the full ast ourselves. I think another ci script uses that too. Might be worth doing it the same?

We already got rid of it because it did not work well with how prek works.
Not mentioning the still big fat warning may change without warning.

The big difference of what ruff does versus what we can do here is that ruff does not generate dependencies to files it did not analyse - and we know the convemtion (from airflow.* from airlfow.sdk*) we are looking for - so we can analyse individual files that are passed to prek rather than having to pre-load all source code - this is what ruff has to do in order to generate "complete" dependency graph. And this means that to check single file, you need to analyze all of them.

See this output to know what I am talking about (a little trimmed down. From that you will see that you need to load all task-sdk + all airflow-core + all providers - so basically all files. to see if you have not used some pendencies from "other" files" - this means (on my machine) 0.47s overhead for just "airflow-core + task-sdk" folders, 2s for "airflow-core + task-sdk + providers", 3s for "." (i.e including all tests, dev and other potentially included files). And it means we have to pay this overhead even if we change one file - i.e. every single commit where a .py file changes, will take 2s longer if we use it.

⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff analyze graph --direction dependencies task-sdk/src/airflow/sdk/definitions/decorators/__init__
warning: `ruff analyze graph` is experimental and may change without warning
{}
⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff analyze graph --direction dependencies task-sdk/src/airflow/sdk/definitions/decorators/
warning: `ruff analyze graph` is experimental and may change without warning
{
  "task-sdk/src/airflow/sdk/definitions/decorators/__init__.py": [
    "task-sdk/src/airflow/sdk/bases/decorator.py",
    "task-sdk/src/airflow/sdk/definitions/dag.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
  ],
  "task-sdk/src/airflow/sdk/definitions/decorators/__init__.pyi": [
    "task-sdk/src/airflow/sdk/bases/decorator.py",
    "task-sdk/src/airflow/sdk/definitions/dag.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
  ],
  "task-sdk/src/airflow/sdk/definitions/decorators/condition.py": [
    "task-sdk/src/airflow/sdk/bases/decorator.py",
    "task-sdk/src/airflow/sdk/bases/operator.py",
    "task-sdk/src/airflow/sdk/definitions/context.py",
    "task-sdk/src/airflow/sdk/exceptions.py"
  ],
  "task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py": [
    "task-sdk/src/airflow/sdk/bases/decorator.py",
    "task-sdk/src/airflow/sdk/bases/operator.py",
    "task-sdk/src/airflow/sdk/definitions/_internal/setup_teardown.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py",
    "task-sdk/src/airflow/sdk/definitions/xcom_arg.py",
    "task-sdk/src/airflow/sdk/exceptions.py"
  ],
  "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py": [
    "task-sdk/src/airflow/sdk/bases/decorator.py",
    "task-sdk/src/airflow/sdk/definitions/_internal/expandinput.py",
    "task-sdk/src/airflow/sdk/definitions/_internal/node.py",
    "task-sdk/src/airflow/sdk/definitions/dag.py",
    "task-sdk/src/airflow/sdk/definitions/mappedoperator.py",
    "task-sdk/src/airflow/sdk/definitions/taskgroup.py",
    "task-sdk/src/airflow/sdk/definitions/xcom_arg.py"
  ]
}
⌁11% [jarekpotiuk:~/code/airflow] add-notice-file-prek-hook(+54/-6)+ ± ruff analyze graph --direction dependencies task-sdk/src/airflow/sdk/definitions/decorators/  airflow-core/src/airflow/providers_manager.py
warning: `ruff analyze graph` is experimental and may change without warning
.... lots of output here
  "task-sdk/src/airflow/sdk/definitions/decorators/__init__.py": [
    "airflow-core/src/airflow/providers_manager.py",
    "task-sdk/src/airflow/sdk/bases/decorator.py",
    "task-sdk/src/airflow/sdk/definitions/dag.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/condition.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/setup_teardown.py",
    "task-sdk/src/airflow/sdk/definitions/decorators/task_group.py"
  ],
... lots of output here
}

Also the code to use it would also be a little complex - we would have to keep exclusion rules inside the code rather than in .pre-commit-config.yaml file - becuase we could only exclude files from the output json if after all of it is generated.

AST + our prek script is way faster for incremental checks. When I removed the decorators/init.py from the config the time to analyse it was 0.8s for all files, and 0.15s when only one file is checked - so it beats the "full check" for incremental check a lot - and this is what really matters for pre-comit - 0.4s for "all files" in CI is nothing, while 1.8s for every commit locally matters a lot.

⌁1% [jarekpotiuk:~/code/airflow] prek-hook-to-check-core-imports(+0/-1)+ 6s ± time prek run check-core-imports --files task-sdk/src/airflow/sdk/definitions/decorators/__init__.py
Running hooks for `task-sdk`:
Check for core imports in task-sdk files.................................Failed
- hook id: check-core-imports
- exit code: 1

  src/airflow/sdk/definitions/decorators/__init__.py:
    Line 21: from airflow.providers_manager import ProvidersManager

  Found 1 core import(s) in task-sdk files
prek run check-core-imports --files   0.15s user 0.40s system 217% cpu 0.256 total
⌁1% [jarekpotiuk:~/code/airflow] prek-hook-to-check-core-imports(+0/-1)+ 1 ± time prek run check-core-imports --all-files
Running hooks for `task-sdk`:
Check for core imports in task-sdk files.................................Failed
- hook id: check-core-imports
- exit code: 1

  src/airflow/sdk/definitions/decorators/__init__.py:
    Line 21: from airflow.providers_manager import ProvidersManager

  Found 1 core import(s) in task-sdk files
prek run check-core-imports --all-files  0.80s user 0.39s system 421% cpu 0.284 total

@amoghrajesh amoghrajesh merged commit 661c807 into apache:main Jan 19, 2026
69 checks passed
jason810496 pushed a commit to jason810496/airflow that referenced this pull request Jan 22, 2026
suii2210 pushed a commit to suii2210/airflow that referenced this pull request Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants