Skip to content

Comments

Add uv sync --no-dev before provider YAML checks#60728

Open
Fury0508 wants to merge 6 commits intoapache:mainfrom
Fury0508:feature/uv-sync-provider-yaml-check
Open

Add uv sync --no-dev before provider YAML checks#60728
Fury0508 wants to merge 6 commits intoapache:mainfrom
Fury0508:feature/uv-sync-provider-yaml-check

Conversation

@Fury0508
Copy link
Contributor

@Fury0508 Fury0508 commented Jan 17, 2026

What change does this PR introduce?

Strips dev dependencies before running provider YAML validation checks by running uv sync --no-dev --all-packages.

Why is this change needed?

Currently, the provider YAML check runs with all dependencies installed, including dev dependencies. This can mask issues where provider code has optional cross-provider dependencies that aren't handled properly (missing try/except blocks for optional imports). By stripping dev dependencies first, we create an environment closer to production and can detect these issues during CI.

Related issue(s)

closes: #60662

Changes

  • Added sync_dependencies_without_dev() function that runs uv sync --no-dev --all-packages before provider checks
  • Function is called at the start of the main execution before any validation runs

Testing

  • Ran uv sync --no-dev --all-packages in breeze container to strip dev dependencies
  • Verified providers and their dependencies (including jsonpath-ng from amazon provider) remain installed
  • Executed python scripts/in_container/run_provider_yaml_files_check.py
  • All provider checks passed with 0 errors

Gen-AI Assisted Contribution

Claude AI was consulted for guidance on Airflow's codebase structure and assistance with resolving CI failures. Core implementation and testing were done independently.

@Fury0508 Fury0508 requested a review from jscheffl January 18, 2026 01:09
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK for me.

Just a bit unsure - so would request another pair of eyes - I saw it also today (before this PR) that my venv is "wiped" after running soe prek commands... alwas need to re-sync my UV venv... is it intended by pre-commit checks to change / alter venvs? Or shall a temporary secondary be created for this check then?

Otherwise LGTM

Copy link
Contributor

@bugraoz93 bugraoz93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Fury0508 , we have recently added an AI Agent section in PRs and saw this PR is 16hours old means while you created it should be there. Could you please fill out that part whether it is used or how degree?

@bugraoz93
Copy link
Contributor

@Fury0508 Fury0508 force-pushed the feature/uv-sync-provider-yaml-check branch from fbbef26 to 292796f Compare January 18, 2026 11:46
@Fury0508 Fury0508 requested a review from jscheffl January 18, 2026 11:58
@Fury0508
Copy link
Contributor Author

@bugraoz93 Thanks for pointing that out! I've updated the PR description with the Gen-AI disclosure section.

@Fury0508
Copy link
Contributor Author

Looks OK for me.

Just a bit unsure - so would request another pair of eyes - I saw it also today (before this PR) that my venv is "wiped" after running soe prek commands... alwas need to re-sync my UV venv... is it intended by pre-commit checks to change / alter venvs? Or shall a temporary secondary be created for this check then?

Otherwise LGTM

Good point! I didn't think about how this would affect local development - only focused on catching the dependency issues in CI.

So yeah, wiping the venv every time someone runs pre-commit hooks is definitely annoying. Should this maybe only run in CI and not locally? Or would creating a temp venv for just this check work better?

Not sure what's the best fix here - happy to adjust based on what you think makes sense!

@bugraoz93
Copy link
Contributor

@bugraoz93 Thanks for pointing that out! I've updated the PR description with the Gen-AI disclosure section.

Thanks @Fury0508!

@Fury0508 Fury0508 requested a review from bugraoz93 January 23, 2026 22:15
@Fury0508
Copy link
Contributor Author

Hi @jscheffl, thanks for the approval! This PR has been approved and all checks pass. Just wondering if there's anything else needed before merge, or if it's waiting in the merge queue. Thanks for your time!

@jscheffl
Copy link
Contributor

Yes as written ^^^- I am okay merging and thus approved, would liek another pair of eyes as reviewer before merge as being not 100% sure.

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alwas need to re-sync my UV venv... is it intended by pre-commit checks to change / alter venvs? Or shall a temporary secondary be created for this check then?

Yeah, I also feel a bit strange at the first glance, which might also introduces some dependencies environment race condition issue from my perspective (e.g. other in-container pre-commit checks depend on the dev dependencies while current script is running).

I think the better way might be

  1. Creating a new venv under a tmp dir and install the dependencies without dev
  2. Extract the current __main__ logic as another function (excluding the sync_dependencies_without_dev call) into a separate function named provider_yaml_files_check.
  3. Run sync_dependencies_without_dev for the venv under tmp dir.
  4. Run provider_yaml_files_check as a subprocess using the virtual environment we just created under the temporary path, ensuring that it runs without development dependencies (for example, by overriding ‎PYTHONPATH, etc.).

to avoid the race condition and environment side effect.

@Fury0508
Copy link
Contributor Author

alwas need to re-sync my UV venv... is it intended by pre-commit checks to change / alter venvs? Or shall a temporary secondary be created for this check then?

Yeah, I also feel a bit strange at the first glance, which might also introduces some dependencies environment race condition issue from my perspective (e.g. other in-container pre-commit checks depend on the dev dependencies while current script is running).

I think the better way might be

  1. Creating a new venv under a tmp dir and install the dependencies without dev
  2. Extract the current __main__ logic as another function (excluding the sync_dependencies_without_dev call) into a separate function named provider_yaml_files_check.
  3. Run sync_dependencies_without_dev for the venv under tmp dir.
  4. Run provider_yaml_files_check as a subprocess using the virtual environment we just created under the temporary path, ensuring that it runs without development dependencies (for example, by overriding ‎PYTHONPATH, etc.).

to avoid the race condition and environment side effect.

Thanks for the detailed feedback @jason810496! You're absolutely right about the race condition and environment side effects.

I'll refactor to use an isolated temporary venv as you suggested:

  1. Create temp venv with non-dev dependencies
  2. Extract validation logic to provider_yaml_files_check()
  3. Run as subprocess using the temp environment

Does this sound like the right direction? Happy to implement if this approach works for the team.

@jason810496
Copy link
Member

Does this sound like the right direction? Happy to implement if this approach works for the team.

Yes, thanks for the follow-up.

@jscheffl
Copy link
Contributor

Does this sound like the right direction? Happy to implement if this approach works for the team.

Yes, thanks for the follow-up.

Thanks for raising the point of race conditions of multiple checks using the same venv. Did not consider this but will be important!

@Fury0508
Copy link
Contributor Author

@jason810496 I'm working on implementing the isolated temp venv but running into an issue I can't figure out.

Here's what I've done so far:

  1. Create temp venv with python -m venv
  2. Export non-dev deps: uv export --no-dev --no-hashes -o requirements.txt
  3. Install them: pip install -r requirements.txt in the temp venv
  4. Install Airflow itself: pip install -e /opt/airflow in the temp venv
  5. Run validation function as subprocess using temp venv's python

But when the subprocess tries to import the validation function, it fails with:

ModuleNotFoundError: No module named 'jsonpath_ng'

Even though I can verify the packages are installed in the temp venv. The issue seems to be that the script has module-level imports like from jsonpath_ng.ext import parse at the top, and these fail before the function even runs.

I tried explicitly installing the script's direct dependencies (jsonpath_ng, rich, tabulate, etc.) but still hitting the same issue.

Am I approaching the temp venv creation wrong? Should I be using uv sync differently, or is there a specific way to install Airflow + its dependencies in the temp venv that I'm missing?

Any guidance would be helpful - I feel like I'm close but missing something about how to properly set up the isolated environment.

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Fury0508, thanks for your update!

I couldn't see your change on GitHub, so feel free to push your commits.

From my perspective, the problem might be the final "Run validation function as subprocess using temp venv's python" step, because we might set the PYTHONPATH wrong. I will log the sys.path in the validation function for debugging to check whether the import paths are correct or not.

Additionally, there is an early comment (not directly related to the problem) that the first "Create temp venv" step, we could use uv instead of python -m venv to speed the process up.

@potiuk
Copy link
Member

potiuk commented Feb 3, 2026

Actually - ideally the whole check should be split into runnnig a check "per-provider" - and before every check you should do:

cd <PROVIDER_FOLDER>
uv sync --no-dev
#  HERE import for all files in this provider should happen

We can use the feature of uv sync and workspace where we can automatically sync all the provider and all it's dependencies (including transitive) by simply entering the distribution folder and runnin uv sync

Alternativelly the same can be done by specifying --project explicitly in uv sync in the root folder, but going to the distribution and running uv sync is way better.

You can find out more about it from my recent FOSDEM presentation: https://fosdem.org/2026/schedule/event/WE7NHM-modern-python-monorepo-apache-airflow/

@shahar1
Copy link
Contributor

shahar1 commented Feb 10, 2026

Somehow I missed this PR and created another to solve the same issue (#61713), which fails the pre-commit even before running the tests.
I think that it would be nice to have both to ensure that we're covered E2E.

@Fury0508
Copy link
Contributor Author

@potiuk - You're right that running uv sync at the distribution folder level would be a cleaner approach. I can see how using the workspace feature would automatically sync all provider dependencies. I'll look into the FOSDEM presentation you linked to better understand the implementation.
@shahar1 - I see you created #61713 which handles the pre-commit validation side. That makes sense to have both layers:

Your PR catches issues at pre-commit stage (faster feedback)
This PR ensures we have the full E2E validation when running the actual YAML checks

Current status:

This PR is approved and all checks are passing
However, based on potiuk's suggestion, I think we should refactor this to use the uv sync approach at the provider folder level rather than the current --no-dev flag implementation
Some work remains to implement this better approach
Sorry for the delay in my work.

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@potiuk potiuk added full tests needed We need to run full set of tests for this PR to merge canary When set on PR running from apache repo - behave as canary run all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs labels Feb 14, 2026
- Add sync_dependencies_without_dev() to strip dev dependencies
- Move jsonpath_ng import inside function to avoid ImportError
- Helps detect unhandled optional cross-provider dependencies

Related: apache#60662
- Exit immediately if uv sync --no-dev fails instead of continuing with warning
- Add stdout display for better visibility of sync operation
- Ensures validation never runs with dev dependencies present
@potiuk potiuk force-pushed the feature/uv-sync-provider-yaml-check branch from b8ecb37 to 018fb69 Compare February 14, 2026 18:15
@potiuk
Copy link
Member

potiuk commented Feb 14, 2026

Let me rebase it with all the tests enabled and let's see if it passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs canary When set on PR running from apache repo - behave as canary run full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove all dev dependencies before provider yaml check

6 participants