-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce upgrades of min-versions for our dependencies #35549
Comments
Would this be implemented as a pre-commit hook? |
Not really, It must be a real/full unit test env because we will need to downgrade dependencies of the provider and then we have to run complete set of tests for the provider - and all that needs to be in breeze image because we have to have all the libraries/dependencies installed. So this test will take quite some time to run and pre-commits should be fast. |
Mhm, as we have ~90 providers meaning as you need to run a test for each provider (minus a few that are installed per default) you need to install ~80 times a python env with each provider individually. This will take serious long time. I see this can only be done in a nightly, would be hard to squeeze this into the PR tests except if you make it in parallel with pip download caching. What also jumps into my mind (but might be a different task) is that we also users asking for support because users are not installing with the right constraints. For support-ability, would it also make sense to check installed modules at runtime and generate a warning in UI if requirements are not met (min/max)? |
About 70%-80% of our PRs concern change in one or few providers. We already have what I named "SelectiveChecks" in place for a few years that produces list of providers affected by the change so most of the PRs of people already run a subset of provider tests (and subset of documentation). It looks into what's coming in the PR (which files changed) and based on that detrmines what set of tests (which provider tests) should be run. It even automatically calculates the set of dependend Providers already (so for example when you modify HTTP provider, OpsGenie, JIRA and few others who use HTTP provider as base will be run). Thanks to that vast majority of those "changes in providers PRs" completes their test in 1-2 minutes rather than 15. It boils down to running tests with https://github.com/apache/airflow/actions/runs/6811281201/job/18521459711?pr=35547 This was a PR that changed only openai provider code. And due to selective checks calculations, the test command that it run was:
the complete test run for this PR was 1m21 seconds Similarly docs build toook 6m instead of complete docs build taking 30 minutes, because the documentation build for that PR was:
And this is precisely where we should warn in casee someone changes openai provier where we want to test openai min dependencies. And it is entirely ok to run another 2m test to test it (repeating The very same selective checks can be used to run those tests. They could be run automaticaly: whenever
The benefit of doing it in the PR is that it's not the magical maintainer who has to go and fix all those, it's the responsibility of contributor who adds new feature to work out what to do - whether we should bump the min requirement, or whether we should add back-compatibility. On the other hand PRs that run all the tests (Airflow Core changes) - do not modify the provider's functionality, so they do not have to run any of those min/max tests. We can run them in their entirety in the I think it is entirely doable to run those tests for vast majority of regular PRs with the selective checks mechanism we have today.
We should not do that I think - More context: Constraints are not really "recommended" set of dependencies. They are the "golden" set which we know we tested, and we know that at the point of time of release they were working. They provide "reproducible installation". But this is where their role ends. Users should be FREE to upgrade/downgrade the dependencies within the min/max requirements. Even more - if the users choose to install newer airflow and older provider, the constraints might not work any more - when for example we bumped Google provider dependencies, if somoene installed an older version of the provider with new Airflow 2.7 (which is entirely possible) - the set of dependencies needed are as far from the current 2.7 constraints as possible - many of the dependencies will have to be downgraded (and this will happen automatically - precisely because older google provider has different set of min/max dependencies and a) Install airflow with extras and constraints -> reproducible installation with "golden set" of depenencies
b) Downgrade/upgrade everything else you need (including downgrading/upgrading providers separately and any other dependencies) WITHOUT CONSTRAINTS.
Those should be ALWAYS two steps - where step b) is where constraints stop being the thing. Like if they did not exist at all. That's by design. And of course it might mean that some dependencies will break something in this case. But other than paying attention to the min/max expectations in our providers and airflow, we cannot do much abou it. BUT this is precisely WHY we should care about the min requirements - because whatever we put there, will influence This is really why we are using constraints not requirements, because the whole point is that we DO NOT want to prevent users to downgrade/upgrade dependencies. And this is already enfgorced (in some cases just guided but this is another story)by Also showing them in the UI is bad audience users of airflow UI cannot do much - the UI users are often not the same users who deploy airflow and install packages. . It's better to stick to what |
With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: apache#35549 Related: apache#39100
…#39946) With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: #35549 Related: #39100
…apache#39946) With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: apache#35549 Related: apache#39100
…apache#39946) With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: apache#35549 Related: apache#39100
…apache#39946) With this change we are running tests of downgrading Airflow dependencies to "lowest-direct" ones - separately for "core" tests and for each provider (and run corresponding tests with it). This should allows us to determine what are the lowest bounds for the dependencies - for Airflow and for individual providers and continue doing it while Airflow evolves in the future. Fixes: apache#35549 Related: apache#39100
Body
It should be entirely possible to check and enforce bumping of min versions for all our dependencies (including providers).
We should be able to run a set of tests (separate test for each provider) where we lower down dependencies tto minimum versions resulting from the resolution and run tests for that provider and see if they pass. Automtating that and making it part of the CI of our should protect us against the case where we add a feature or use an API that is not available in some older version of a dependency we are using.
This would be an ultimate automation of our dependencies where we would care about both sides of the dependency requirements. The current automation cares about upper-binding, but lower-binding is at the discretion of the contributor and keen eye of the reviewer, both are fragile and easy to get wrong - because of the number of dependencies we have, limited review time/focus and sometimes unobvious coupling between versions of dependencies and features/apis.
Committer
The text was updated successfully, but these errors were encountered: