Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce upgrades of min-versions for our dependencies #35549

Closed
1 task done
potiuk opened this issue Nov 9, 2023 · 4 comments · Fixed by #39946
Closed
1 task done

Enforce upgrades of min-versions for our dependencies #35549

potiuk opened this issue Nov 9, 2023 · 4 comments · Fixed by #39946
Assignees
Labels
kind:meta High-level information important to the community

Comments

@potiuk
Copy link
Member

potiuk commented Nov 9, 2023

Body

It should be entirely possible to check and enforce bumping of min versions for all our dependencies (including providers).

We should be able to run a set of tests (separate test for each provider) where we lower down dependencies tto minimum versions resulting from the resolution and run tests for that provider and see if they pass. Automtating that and making it part of the CI of our should protect us against the case where we add a feature or use an API that is not available in some older version of a dependency we are using.

This would be an ultimate automation of our dependencies where we would care about both sides of the dependency requirements. The current automation cares about upper-binding, but lower-binding is at the discretion of the contributor and keen eye of the reviewer, both are fragile and easy to get wrong - because of the number of dependencies we have, limited review time/focus and sometimes unobvious coupling between versions of dependencies and features/apis.

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@potiuk potiuk added the kind:meta High-level information important to the community label Nov 9, 2023
@potiuk potiuk self-assigned this Nov 9, 2023
@potiuk potiuk changed the title Enforce upgrades of min-versiosns for our dependencies Enforce upgrades of min-versions for our dependencies Nov 9, 2023
@RNHTTR
Copy link
Contributor

RNHTTR commented Nov 9, 2023

Would this be implemented as a pre-commit hook?

@potiuk
Copy link
Member Author

potiuk commented Nov 9, 2023

Not really, It must be a real/full unit test env because we will need to downgrade dependencies of the provider and then we have to run complete set of tests for the provider - and all that needs to be in breeze image because we have to have all the libraries/dependencies installed. So this test will take quite some time to run and pre-commits should be fast.

@jscheffl
Copy link
Contributor

jscheffl commented Nov 9, 2023

Mhm, as we have ~90 providers meaning as you need to run a test for each provider (minus a few that are installed per default) you need to install ~80 times a python env with each provider individually. This will take serious long time. I see this can only be done in a nightly, would be hard to squeeze this into the PR tests except if you make it in parallel with pip download caching.

What also jumps into my mind (but might be a different task) is that we also users asking for support because users are not installing with the right constraints. For support-ability, would it also make sense to check installed modules at runtime and generate a warning in UI if requirements are not met (min/max)?

@potiuk
Copy link
Member Author

potiuk commented Nov 10, 2023

Mhm, as we have ~90 providers meaning as you need to run a test for each provider (minus a few that are installed per default) you need to install ~80 times a python env with each provider individually. This will take serious long time. I see this can only be done in a nightly, would be hard to squeeze this into the PR tests except if you make it in parallel with pip download caching.

About 70%-80% of our PRs concern change in one or few providers. We already have what I named "SelectiveChecks" in place for a few years that produces list of providers affected by the change so most of the PRs of people already run a subset of provider tests (and subset of documentation). It looks into what's coming in the PR (which files changed) and based on that detrmines what set of tests (which provider tests) should be run. It even automatically calculates the set of dependend Providers already (so for example when you modify HTTP provider, OpsGenie, JIRA and few others who use HTTP provider as base will be run). Thanks to that vast majority of those "changes in providers PRs" completes their test in 1-2 minutes rather than 15.

It boils down to running tests with Providers[openai] for example. Look for example here:

https://github.com/apache/airflow/actions/runs/6811281201/job/18521459711?pr=35547

This was a PR that changed only openai provider code. And due to selective checks calculations, the test command that it run was:

breeze testing db-tests --parallel-test-types "Always Providers[openai]"

the complete test run for this PR was 1m21 seconds

Similarly docs build toook 6m instead of complete docs build taking 30 minutes, because the documentation build for that PR was:

breeze build-docs apache-airflow openai

And this is precisely where we should warn in casee someone changes openai provier where we want to test openai min dependencies. And it is entirely ok to run another 2m test to test it (repeating Providers[openai] tests after downgrading dependencies for it.

The very same selective checks can be used to run those tests. They could be run automaticaly: whenever Providers[PROVIDER] is run - we should run this test in two variants:

  • current (latest) dependencies
  • min dependencies

The benefit of doing it in the PR is that it's not the magical maintainer who has to go and fix all those, it's the responsibility of contributor who adds new feature to work out what to do - whether we should bump the min requirement, or whether we should add back-compatibility.

On the other hand PRs that run all the tests (Airflow Core changes) - do not modify the provider's functionality, so they do not have to run any of those min/max tests. We can run them in their entirety in the canary build - this is what the canary build is, one of the functions of it is to generate updates in constratints, but also it runs a complete set of tests for everything to make sure everything is green even if some of the selective checks gave false positive by skipping som of those tests. And there maintainers indeed might (and do) step-in and either fix or revert a change that caused it (like we do currently).

I think it is entirely doable to run those tests for vast majority of regular PRs with the selective checks mechanism we have today.

What also jumps into my mind (but might be a different task) is that we also users asking for support because users are not installing with the right constraints. For support-ability, would it also make sense to check installed modules at runtime and generate a warning in UI if requirements are not met (min/max)?

We should not do that I think - pip is already doing it for us. If you want to limit those to constraints - this is a bad idea, but pip already limits (or at the very least warns) the user when they install something outside of the min/max limits for the packages they install.

More context:

Constraints are not really "recommended" set of dependencies. They are the "golden" set which we know we tested, and we know that at the point of time of release they were working. They provide "reproducible installation". But this is where their role ends. Users should be FREE to upgrade/downgrade the dependencies within the min/max requirements. Even more - if the users choose to install newer airflow and older provider, the constraints might not work any more - when for example we bumped Google provider dependencies, if somoene installed an older version of the provider with new Airflow 2.7 (which is entirely possible) - the set of dependencies needed are as far from the current 2.7 constraints as possible - many of the dependencies will have to be downgraded (and this will happen automatically - precisely because older google provider has different set of min/max dependencies and pip will follow this and bring those dependencies down. This is also why recommended installation process is:

a) Install airflow with extras and constraints -> reproducible installation with "golden set" of depenencies

pip install apache-airflow[EXTRAS_HERE]==2.7.3 --constraint http:///  .... 2.7.3/ ... -3.8.txt

b) Downgrade/upgrade everything else you need (including downgrading/upgrading providers separately and any other dependencies) WITHOUT CONSTRAINTS.

pip install apache-airflow==2.7.3  ADD_YOUR_DEPENDENCIES_INCLUDING_SPECIFIC_PROVIDER_PACKAGE_VERSION

Those should be ALWAYS two steps - where step b) is where constraints stop being the thing. Like if they did not exist at all. That's by design. And of course it might mean that some dependencies will break something in this case. But other than paying attention to the min/max expectations in our providers and airflow, we cannot do much abou it. BUT this is precisely WHY we should care about the min requirements - because whatever we put there, will influence pip decision on which dependencies get upgraded / downgraded. The b) command is the sole reason why it is worth to do the tests and exercise I mentioned. Because because of the b) case, sometimes people might downgrade some dependencies that should not be downgraded, becasue some of the providers installed expect them to have higher versions implicitlly. This is the whole reason why I want to implement those tests here.

This is really why we are using constraints not requirements, because the whole point is that we DO NOT want to prevent users to downgrade/upgrade dependencies. And this is already enfgorced (in some cases just guided but this is another story)by pip and min/max requirements of the packages you install in step b) - when you install provider with a set of min/max dependencies, pip will try to figure the right set of those that fail into what airflow 2.7.3 expects and what those new dependencies need.

Also showing them in the UI is bad audience users of airflow UI cannot do much - the UI users are often not the same users who deploy airflow and install packages. . It's better to stick to what pip shows when you install packages IMHO.

potiuk added a commit to potiuk/airflow that referenced this issue Jun 3, 2024
With this change we are running tests of downgrading Airflow
dependencies to "lowest-direct" ones - separately for "core" tests
and for each provider (and run corresponding tests with it). This
should allows us to determine what are the lowest bounds for the
dependencies - for Airflow and for individual providers and continue
doing it while Airflow evolves in the future.

Fixes: apache#35549
Related: apache#39100
potiuk added a commit that referenced this issue Jun 4, 2024
…#39946)

With this change we are running tests of downgrading Airflow
dependencies to "lowest-direct" ones - separately for "core" tests
and for each provider (and run corresponding tests with it). This
should allows us to determine what are the lowest bounds for the
dependencies - for Airflow and for individual providers and continue
doing it while Airflow evolves in the future.

Fixes: #35549
Related: #39100
fdemiane pushed a commit to fdemiane/airflow that referenced this issue Jun 6, 2024
…apache#39946)

With this change we are running tests of downgrading Airflow
dependencies to "lowest-direct" ones - separately for "core" tests
and for each provider (and run corresponding tests with it). This
should allows us to determine what are the lowest bounds for the
dependencies - for Airflow and for individual providers and continue
doing it while Airflow evolves in the future.

Fixes: apache#35549
Related: apache#39100
syedahsn pushed a commit to aws-mwaa/upstream-to-airflow that referenced this issue Jun 7, 2024
…apache#39946)

With this change we are running tests of downgrading Airflow
dependencies to "lowest-direct" ones - separately for "core" tests
and for each provider (and run corresponding tests with it). This
should allows us to determine what are the lowest bounds for the
dependencies - for Airflow and for individual providers and continue
doing it while Airflow evolves in the future.

Fixes: apache#35549
Related: apache#39100
romsharon98 pushed a commit to romsharon98/airflow that referenced this issue Jul 26, 2024
…apache#39946)

With this change we are running tests of downgrading Airflow
dependencies to "lowest-direct" ones - separately for "core" tests
and for each provider (and run corresponding tests with it). This
should allows us to determine what are the lowest bounds for the
dependencies - for Airflow and for individual providers and continue
doing it while Airflow evolves in the future.

Fixes: apache#35549
Related: apache#39100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:meta High-level information important to the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants