Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider discovery based on entry_points is rather brittle #12692

Closed
potiuk opened this issue Nov 28, 2020 · 4 comments · Fixed by #12694
Closed

Provider discovery based on entry_points is rather brittle #12692

potiuk opened this issue Nov 28, 2020 · 4 comments · Fixed by #12694
Assignees
Labels
kind:bug This is a clearly a bug priority:critical Showstopper bug that should be patched immediately

Comments

@potiuk
Copy link
Member

potiuk commented Nov 28, 2020

The tests we run in CI had shown that provider discovery based on entry_points is rather brittle.

Example here:

https://github.com/apache/airflow/pull/12466/checks?check_run_id=1467792592#step:9:4452

This is not a problem with Airflow, but wth PIP which might silently upgrade some packages and cause "version conflict" totally independently from Airflow configuration and totally out-of-our-control.

Simple installing a whl package on top of the existing airflow installation (as it happened in the case above) might cause inconsistent requirements (in the case above installing .whl packages with all providers on top of existing Airflow installation caused the requests package to be upgraded to 2.25.0, even if airflow has the right requirements set. In this case it was (correct and it is from the "install_requires" section of airflow's setup.cfg):

Requirement.parse('requests<2.24.0,>=2.20.0'), {'apache-airflow'}

In case you have a version conflict in your env, running entry_point.load() from a package that has this version conflicts results with pkg_resources.VersionConflict error or `pkg_resources.ContextualVersionConflict) rather than returning the entry_point. Or at least that's what I observed so far. It's rather easy to reproduce. Simply install requests > 2.24.0 in the current airflow and see what happens.

So far I could not find a way to mitigate this problem, but @ashb - since you have more experience with it, maybe you can find a workaround for this?

I think we have a few options:

  1. We fail 'airflow' hard if there is any Version Conflict. We have a way now after I've implemented #Make sure that we have no conflicting dependencies when installing. #10854 (and after @ephraimbuddy finishes the Upgrade azure blob to v12 #12188 ) - we have a good, maintainable list of non-conflicting dependencies for Airflow and it's providers and we can keep that in the future thanks to pip-check. But I am afraid that will give a hard time to people who would like to install airflow with some custom dependencies (Tensorflow for example, depending on versions is notoriously difficult to sync with Airflow when it comes to dependencies). However, this is the most "Proper" (TM) solution.

  2. We find a workaround for the entry_point.load() VersionConflict exception. However, I think that might not be possible or easy looking for example at this SO thread: https://stackoverflow.com/questions/52982603/python-entry-point-fails-for-dependency-conflict . The most upvoted (=1) answer there starts with "Welcome to the world of dependencies hell! I know no clean way to solve this" - which is not very encouraging. I tried also to find it out from docs and code of the entry_point.load() but to no avail. @ashb - maybe you can help here.

  3. We go back to the original implementation of mine where I read provider info from provider.yaml embedded into the package. This has disadvantage of being non-standard, but it works independently of version conflicts.

WDYT?

@potiuk potiuk added kind:bug This is a clearly a bug priority:critical Showstopper bug that should be patched immediately labels Nov 28, 2020
@potiuk
Copy link
Member Author

potiuk commented Nov 28, 2020

cc: @mjpieters -> maybe you can help with that as well :)?

@ashb
Copy link
Member

ashb commented Nov 28, 2020

I have any idea - I'll try it after dinner (1hr) they should help this

@ashb
Copy link
Member

ashb commented Nov 28, 2020

PR coming.

ashb added a commit to astronomer/airflow that referenced this issue Nov 28, 2020
…t errors

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes apache#12692
potiuk pushed a commit that referenced this issue Nov 29, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692
@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2020

For those coming here to look for a solution: @ashb switched to importlib_metadata and it solved the problem entirely (and is much faster).

See #12694

kaxil pushed a commit that referenced this issue Dec 3, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692

(cherry picked from commit 7ef9aa7)
kaxil pushed a commit that referenced this issue Dec 3, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692

(cherry picked from commit 7ef9aa7)
kaxil pushed a commit that referenced this issue Dec 3, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692

(cherry picked from commit 7ef9aa7)
potiuk added a commit to PolideaInternal/airflow that referenced this issue Dec 9, 2020
Some parts of our CI are installing (for tests) provider packages
inside the CI container. That has proven to be rather brittle in
terms of keeping the requirements in a consistent state (and is
totally out of our control as described in apache#12692, however in our
tests we can continue using constraints to make sure we use
consistent set of requirements.
potiuk added a commit to PolideaInternal/airflow that referenced this issue Dec 31, 2020
Some parts of our CI are installing (for tests) provider packages
inside the CI container. That has proven to be rather brittle in
terms of keeping the requirements in a consistent state (and is
totally out of our control as described in apache#12692, however in our
tests we can continue using constraints to make sure we use
consistent set of requirements.
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this issue Mar 5, 2021
…t errors (apache#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes apache#12692

(cherry picked from commit 7ef9aa7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:critical Showstopper bug that should be patched immediately
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants