-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provider discovery based on entry_points is rather brittle #12692
Labels
kind:bug
This is a clearly a bug
priority:critical
Showstopper bug that should be patched immediately
Comments
potiuk
added
kind:bug
This is a clearly a bug
priority:critical
Showstopper bug that should be patched immediately
labels
Nov 28, 2020
cc: @mjpieters -> maybe you can help with that as well :)? |
I have any idea - I'll try it after dinner (1hr) they should help this |
PR coming. |
ashb
added a commit
to astronomer/airflow
that referenced
this issue
Nov 28, 2020
…t errors Using `pkg_resources.iter_entry_points` validates the version constraints, and if any fail it will throw an Exception for that entrypoint. This sounds nice, but is a huge mis-feature. So instead of that, switch to using importlib.metadata (well, it's backport importlib_metadata) that just gives us the entrypoints - no other verification of requirements is performed. This has two advantages: 1. providers and plugins load much more reliably. 2. it's faster too Closes apache#12692
potiuk
pushed a commit
that referenced
this issue
Nov 29, 2020
…t errors (#12694) Using `pkg_resources.iter_entry_points` validates the version constraints, and if any fail it will throw an Exception for that entrypoint. This sounds nice, but is a huge mis-feature. So instead of that, switch to using importlib.metadata (well, it's backport importlib_metadata) that just gives us the entrypoints - no other verification of requirements is performed. This has two advantages: 1. providers and plugins load much more reliably. 2. it's faster too Closes #12692
kaxil
pushed a commit
that referenced
this issue
Dec 3, 2020
…t errors (#12694) Using `pkg_resources.iter_entry_points` validates the version constraints, and if any fail it will throw an Exception for that entrypoint. This sounds nice, but is a huge mis-feature. So instead of that, switch to using importlib.metadata (well, it's backport importlib_metadata) that just gives us the entrypoints - no other verification of requirements is performed. This has two advantages: 1. providers and plugins load much more reliably. 2. it's faster too Closes #12692 (cherry picked from commit 7ef9aa7)
kaxil
pushed a commit
that referenced
this issue
Dec 3, 2020
…t errors (#12694) Using `pkg_resources.iter_entry_points` validates the version constraints, and if any fail it will throw an Exception for that entrypoint. This sounds nice, but is a huge mis-feature. So instead of that, switch to using importlib.metadata (well, it's backport importlib_metadata) that just gives us the entrypoints - no other verification of requirements is performed. This has two advantages: 1. providers and plugins load much more reliably. 2. it's faster too Closes #12692 (cherry picked from commit 7ef9aa7)
kaxil
pushed a commit
that referenced
this issue
Dec 3, 2020
…t errors (#12694) Using `pkg_resources.iter_entry_points` validates the version constraints, and if any fail it will throw an Exception for that entrypoint. This sounds nice, but is a huge mis-feature. So instead of that, switch to using importlib.metadata (well, it's backport importlib_metadata) that just gives us the entrypoints - no other verification of requirements is performed. This has two advantages: 1. providers and plugins load much more reliably. 2. it's faster too Closes #12692 (cherry picked from commit 7ef9aa7)
potiuk
added a commit
to PolideaInternal/airflow
that referenced
this issue
Dec 9, 2020
Some parts of our CI are installing (for tests) provider packages inside the CI container. That has proven to be rather brittle in terms of keeping the requirements in a consistent state (and is totally out of our control as described in apache#12692, however in our tests we can continue using constraints to make sure we use consistent set of requirements.
potiuk
added a commit
to PolideaInternal/airflow
that referenced
this issue
Dec 31, 2020
Some parts of our CI are installing (for tests) provider packages inside the CI container. That has proven to be rather brittle in terms of keeping the requirements in a consistent state (and is totally out of our control as described in apache#12692, however in our tests we can continue using constraints to make sure we use consistent set of requirements.
cfei18
pushed a commit
to cfei18/incubator-airflow
that referenced
this issue
Mar 5, 2021
…t errors (apache#12694) Using `pkg_resources.iter_entry_points` validates the version constraints, and if any fail it will throw an Exception for that entrypoint. This sounds nice, but is a huge mis-feature. So instead of that, switch to using importlib.metadata (well, it's backport importlib_metadata) that just gives us the entrypoints - no other verification of requirements is performed. This has two advantages: 1. providers and plugins load much more reliably. 2. it's faster too Closes apache#12692 (cherry picked from commit 7ef9aa7)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind:bug
This is a clearly a bug
priority:critical
Showstopper bug that should be patched immediately
The tests we run in CI had shown that provider discovery based on entry_points is rather brittle.
Example here:
https://github.com/apache/airflow/pull/12466/checks?check_run_id=1467792592#step:9:4452
This is not a problem with Airflow, but wth PIP which might silently upgrade some packages and cause "version conflict" totally independently from Airflow configuration and totally out-of-our-control.
Simple installing a whl package on top of the existing airflow installation (as it happened in the case above) might cause inconsistent requirements (in the case above installing .whl packages with all providers on top of existing Airflow installation caused the requests package to be upgraded to 2.25.0, even if airflow has the right requirements set. In this case it was (correct and it is from the "install_requires" section of airflow's setup.cfg):
In case you have a version conflict in your env, running entry_point.load() from a package that has this version conflicts results with
pkg_resources.VersionConflict
error or `pkg_resources.ContextualVersionConflict) rather than returning the entry_point. Or at least that's what I observed so far. It's rather easy to reproduce. Simply install requests > 2.24.0 in the current airflow and see what happens.So far I could not find a way to mitigate this problem, but @ashb - since you have more experience with it, maybe you can find a workaround for this?
I think we have a few options:
We fail 'airflow' hard if there is any Version Conflict. We have a way now after I've implemented #Make sure that we have no conflicting dependencies when installing. #10854 (and after @ephraimbuddy finishes the Upgrade azure blob to v12 #12188 ) - we have a good, maintainable list of non-conflicting dependencies for Airflow and it's providers and we can keep that in the future thanks to pip-check. But I am afraid that will give a hard time to people who would like to install airflow with some custom dependencies (Tensorflow for example, depending on versions is notoriously difficult to sync with Airflow when it comes to dependencies). However, this is the most "Proper" (TM) solution.
We find a workaround for the entry_point.load() VersionConflict exception. However, I think that might not be possible or easy looking for example at this SO thread: https://stackoverflow.com/questions/52982603/python-entry-point-fails-for-dependency-conflict . The most upvoted (=1) answer there starts with "Welcome to the world of dependencies hell! I know no clean way to solve this" - which is not very encouraging. I tried also to find it out from docs and code of the entry_point.load() but to no avail. @ashb - maybe you can help here.
We go back to the original implementation of mine where I read provider info from provider.yaml embedded into the package. This has disadvantage of being non-standard, but it works independently of version conflicts.
WDYT?
The text was updated successfully, but these errors were encountered: