Enforce upgrades of min-versions for our dependencies #35549

potiuk · 2023-11-09T10:48:14Z

Body

It should be entirely possible to check and enforce bumping of min versions for all our dependencies (including providers).

We should be able to run a set of tests (separate test for each provider) where we lower down dependencies tto minimum versions resulting from the resolution and run tests for that provider and see if they pass. Automtating that and making it part of the CI of our should protect us against the case where we add a feature or use an API that is not available in some older version of a dependency we are using.

This would be an ultimate automation of our dependencies where we would care about both sides of the dependency requirements. The current automation cares about upper-binding, but lower-binding is at the discretion of the contributor and keen eye of the reviewer, both are fragile and easy to get wrong - because of the number of dependencies we have, limited review time/focus and sometimes unobvious coupling between versions of dependencies and features/apis.

Committer

I acknowledge that I am a maintainer/committer of the Apache Airflow project.

RNHTTR · 2023-11-09T18:44:02Z

Would this be implemented as a pre-commit hook?

potiuk · 2023-11-09T18:59:23Z

Not really, It must be a real/full unit test env because we will need to downgrade dependencies of the provider and then we have to run complete set of tests for the provider - and all that needs to be in breeze image because we have to have all the libraries/dependencies installed. So this test will take quite some time to run and pre-commits should be fast.

jscheffl · 2023-11-09T22:35:38Z

Mhm, as we have ~90 providers meaning as you need to run a test for each provider (minus a few that are installed per default) you need to install ~80 times a python env with each provider individually. This will take serious long time. I see this can only be done in a nightly, would be hard to squeeze this into the PR tests except if you make it in parallel with pip download caching.

What also jumps into my mind (but might be a different task) is that we also users asking for support because users are not installing with the right constraints. For support-ability, would it also make sense to check installed modules at runtime and generate a warning in UI if requirements are not met (min/max)?

potiuk · 2023-11-10T09:03:17Z

Mhm, as we have ~90 providers meaning as you need to run a test for each provider (minus a few that are installed per default) you need to install ~80 times a python env with each provider individually. This will take serious long time. I see this can only be done in a nightly, would be hard to squeeze this into the PR tests except if you make it in parallel with pip download caching.

About 70%-80% of our PRs concern change in one or few providers. We already have what I named "SelectiveChecks" in place for a few years that produces list of providers affected by the change so most of the PRs of people already run a subset of provider tests (and subset of documentation). It looks into what's coming in the PR (which files changed) and based on that detrmines what set of tests (which provider tests) should be run. It even automatically calculates the set of dependend Providers already (so for example when you modify HTTP provider, OpsGenie, JIRA and few others who use HTTP provider as base will be run). Thanks to that vast majority of those "changes in providers PRs" completes their test in 1-2 minutes rather than 15.

It boils down to running tests with Providers[openai] for example. Look for example here:

https://github.com/apache/airflow/actions/runs/6811281201/job/18521459711?pr=35547

This was a PR that changed only openai provider code. And due to selective checks calculations, the test command that it run was:

breeze testing db-tests --parallel-test-types "Always Providers[openai]"

the complete test run for this PR was 1m21 seconds

Similarly docs build toook 6m instead of complete docs build taking 30 minutes, because the documentation build for that PR was:

breeze build-docs apache-airflow openai

And this is precisely where we should warn in casee someone changes openai provier where we want to test openai min dependencies. And it is entirely ok to run another 2m test to test it (repeating Providers[openai] tests after downgrading dependencies for it.

The very same selective checks can be used to run those tests. They could be run automaticaly: whenever Providers[PROVIDER] is run - we should run this test in two variants:

current (latest) dependencies
min dependencies

The benefit of doing it in the PR is that it's not the magical maintainer who has to go and fix all those, it's the responsibility of contributor who adds new feature to work out what to do - whether we should bump the min requirement, or whether we should add back-compatibility.

On the other hand PRs that run all the tests (Airflow Core changes) - do not modify the provider's functionality, so they do not have to run any of those min/max tests. We can run them in their entirety in the canary build - this is what the canary build is, one of the functions of it is to generate updates in constratints, but also it runs a complete set of tests for everything to make sure everything is green even if some of the selective checks gave false positive by skipping som of those tests. And there maintainers indeed might (and do) step-in and either fix or revert a change that caused it (like we do currently).

I think it is entirely doable to run those tests for vast majority of regular PRs with the selective checks mechanism we have today.

What also jumps into my mind (but might be a different task) is that we also users asking for support because users are not installing with the right constraints. For support-ability, would it also make sense to check installed modules at runtime and generate a warning in UI if requirements are not met (min/max)?

We should not do that I think - pip is already doing it for us. If you want to limit those to constraints - this is a bad idea, but pip already limits (or at the very least warns) the user when they install something outside of the min/max limits for the packages they install.

More context:

Constraints are not really "recommended" set of dependencies. They are the "golden" set which we know we tested, and we know that at the point of time of release they were working. They provide "reproducible installation". But this is where their role ends. Users should be FREE to upgrade/downgrade the dependencies within the min/max requirements. Even more - if the users choose to install newer airflow and older provider, the constraints might not work any more - when for example we bumped Google provider dependencies, if somoene installed an older version of the provider with new Airflow 2.7 (which is entirely possible) - the set of dependencies needed are as far from the current 2.7 constraints as possible - many of the dependencies will have to be downgraded (and this will happen automatically - precisely because older google provider has different set of min/max dependencies and pip will follow this and bring those dependencies down. This is also why recommended installation process is:

a) Install airflow with extras and constraints -> reproducible installation with "golden set" of depenencies

pip install apache-airflow[EXTRAS_HERE]==2.7.3 --constraint http:///  .... 2.7.3/ ... -3.8.txt

b) Downgrade/upgrade everything else you need (including downgrading/upgrading providers separately and any other dependencies) WITHOUT CONSTRAINTS.

pip install apache-airflow==2.7.3  ADD_YOUR_DEPENDENCIES_INCLUDING_SPECIFIC_PROVIDER_PACKAGE_VERSION

Those should be ALWAYS two steps - where step b) is where constraints stop being the thing. Like if they did not exist at all. That's by design. And of course it might mean that some dependencies will break something in this case. But other than paying attention to the min/max expectations in our providers and airflow, we cannot do much abou it. BUT this is precisely WHY we should care about the min requirements - because whatever we put there, will influence pip decision on which dependencies get upgraded / downgraded. The b) command is the sole reason why it is worth to do the tests and exercise I mentioned. Because because of the b) case, sometimes people might downgrade some dependencies that should not be downgraded, becasue some of the providers installed expect them to have higher versions implicitlly. This is the whole reason why I want to implement those tests here.

This is really why we are using constraints not requirements, because the whole point is that we DO NOT want to prevent users to downgrade/upgrade dependencies. And this is already enfgorced (in some cases just guided but this is another story)by pip and min/max requirements of the packages you install in step b) - when you install provider with a set of min/max dependencies, pip will try to figure the right set of those that fail into what airflow 2.7.3 expects and what those new dependencies need.

Also showing them in the UI is bad audience users of airflow UI cannot do much - the UI users are often not the same users who deploy airflow and install packages. . It's better to stick to what pip shows when you install packages IMHO.

With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: apache#35549 Related: apache#39100

…#39946) With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: #35549 Related: #39100

…apache#39946) With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: apache#35549 Related: apache#39100

potiuk added the kind:meta High-level information important to the community label Nov 9, 2023

potiuk self-assigned this Nov 9, 2023

potiuk mentioned this issue Nov 9, 2023

Update es read query to not use body #34792

Merged

potiuk changed the title ~~Enforce upgrades of min-versiosns for our dependencies~~ Enforce upgrades of min-versions for our dependencies Nov 9, 2023

potiuk mentioned this issue Nov 14, 2023

Add temporary limit for jsonschema-specification package #35640

Merged

potiuk mentioned this issue Jan 11, 2024

Add support of Pendulum 3 #36281

Merged

potiuk mentioned this issue Mar 14, 2024

Add Yandex Query support from Yandex.Cloud #37458

Merged

potiuk mentioned this issue Jun 1, 2024

Implement per-provider tests with lowest-direct dependency resolution #39946

Merged

potiuk closed this as completed in #39946 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce upgrades of min-versions for our dependencies #35549

Enforce upgrades of min-versions for our dependencies #35549

potiuk commented Nov 9, 2023

RNHTTR commented Nov 9, 2023

potiuk commented Nov 9, 2023 •

edited

Loading

jscheffl commented Nov 9, 2023

potiuk commented Nov 10, 2023 •

edited

Loading

Enforce upgrades of min-versions for our dependencies #35549

Enforce upgrades of min-versions for our dependencies #35549

Comments

potiuk commented Nov 9, 2023

Body

Committer

RNHTTR commented Nov 9, 2023

potiuk commented Nov 9, 2023 • edited Loading

jscheffl commented Nov 9, 2023

potiuk commented Nov 10, 2023 • edited Loading

potiuk commented Nov 9, 2023 •

edited

Loading

potiuk commented Nov 10, 2023 •

edited

Loading