Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider creating "exclusion list" to fall-back to the old resolver automatically #9231

Closed
potiuk opened this issue Dec 6, 2020 · 26 comments

Comments

@potiuk
Copy link
Contributor

potiuk commented Dec 6, 2020

What did you want to do?

I would love that PIP maintainers consider quickly adding an exclusion list to the released PIP version, to fall-back to old resolver for some of the projects for which it is known that the new resolver causes problems, or when particular features are used that the new resolver is not ready for yet.

Such a list could be rather small. From the comment here It seems that for vast majority of the project have no problems with the new resolver, but there are some - for example Apache Airflow, but likely quite a few other projects in the category of platforms/tools that have problems with the installation during the new resolver that seem to be difficult to solve quickly. Some of the issues are described in those: #9187, #9223, #9215, #9214, #9203, #9186 . Using any of the problematic projects as a direct installation target or (if possible and straightforward) would automatically trigger the legacy mode for PIP (which could be overwritten for testing fixes with another flag). This could be accompanied by a warning message that explains why the new resolver is used and that there is an on-going effort to fix it.

Also, there are some features that are broken in the new resolver: #9229, #9219, #9217, #9216, #9190, #9188 , #9182, #9180. and using any of the features (maybe impossible for all those, but maybe straightforward for some, like using proxy) could automatically trigger legacy resolver, again including a warning message stating that that problem will get fixed in the upcoming version but for now, PIP is falling back to the old resolver).

Additional information

The current state of resolver for some projects is seriously problematic (in short - new resolver does not work for them), but there seems to be a rather small list of such projects and features that are broken, and it could be rather easy to keep such a list updated. It could for example be kept remotely at the PIP GitHub site as part of the PIP repository, and it could be (for a time being) remotely refreshed by PIP from time to time to be able to keep adding new projects to the list, if/when new problems are reported. PIP - by its nature requires the network to function, so refreshing such a list from a well-known, PIP-maintainer controlled location should be possible and could be attempted (with quick falling back to the latest cached version of the list) every time a pip install/update etc. command is run.

There was a proposal raised by my in this comment to withdraw the new resolver until the problem is solved, but after the comment here it is rather clear that despite the severity of the problems, they seem to be isolated to a rather small group of projects and features. And there is a big value in as many projects trying the new PIP resolver by default as possible. On the other hand, some of the projects are suffering because their users are unaware of the PIP problems, and no matter how hard those projects try (for example here apache/airflow#12838) their users will get a perception, that those projects are broken, rather than PIP.

Having such an exclusion list might be the best of both worlds. On one hand, it might give the projects so needed stability of installation, on the other hand, this might give the PIP team a necessary testing ground for the new resolver - taking into account that it works in the majority of cases and there are only a handful of exceptions.

It would also give the PIP team the necessary breathing space to fix all such issues properly and without having maintainers of such projects expressing their frustration and feelings - this would give an easy to maintain workaround for anyone whose problems are severe enough that makes it impossible to workaround in a different way.

Another good side of that - having such list of "problematic" problems, could give the PIP team to build relationships with those "heavy users" of PIP that could be helpful in testing any future features of PIP. This would be a great opportunity to turn the "problem" into an "opportunity" of building better cooperation between some open-souirce projects.

Please kindly consider my proposal.

Also - if you would consider that, I am happy to help with the implementation. I do not know PIP internals. but maybe it ain't so difficult. Happy to spend quite some time on it if you want to work on more important stuff.

@potiuk potiuk changed the title Consider creating "exclusion list" to fall-back to the old resolver Consider creating "exclusion list" to fall-back to the old resolver automatically Dec 6, 2020
@uranusjr
Copy link
Member

uranusjr commented Dec 6, 2020

I understand the rationale of excluding cases because they are known to not work well, but wouldn’t think proposal only kick the can down the road until it becomes a problem again? I don’t imagine Airflow can drop dependencies since they are needed for its functionalities, but pip also cannot keep maintaining the old resolver indefinitely so that some packages can stay out of the migration. If there’s one thing the beta rollout phase shows to me, in Airflow’s case, is that Airflow users do not use the new resolver unless we force them to, and disabling the new resolver on Airflow would only keep them in the pre-20.3 era, and come back when things inevitably break again. How can this be prevented?

The tension between package maintainers and pip maintainers is mutual. We regularly need to deal with “pip does not work on package X” reports. The problem here is not the new resolver needs to backtrack, and the legacy resolver does not make the root cause go away, but only make it pop up in some other places, that users would blame pip for, instead of the package they want to install. This is the main reason pip maintainers want to avoid going back to the legacy resolver. The problem has always been there, but 20.3 finally surfaces it to the package maintainers, instead of putting it on pip’s issue tracker.

Summarising to make my position explicit. I do not object the proposal to disable the new resolver temporarily on Airflow, but the projects being disabled must provide a concrete plan to improve things. The issues reported against the new resolver are the best source the pip maintainers have to identify issues in the new resolver (and to isolate things that are not resolver issues; many references you listed above are exactly that), and a proposal to disable the new resolver must provide an alternative method to coming up with examples like this, so the new resolver can really be improved while it’s disabled.

@potiuk
Copy link
Contributor Author

potiuk commented Dec 6, 2020

I understand the rationale of excluding cases because they are known to not work well, but wouldn’t think proposal only kick the can down the road until it becomes a problem again? I don’t imagine Airflow can drop dependencies since they are needed for its functionalities, but pip also cannot keep maintaining the old resolver indefinitely so that some packages can stay out of the migration. If there’s one thing the beta rollout phase shows to me, in Airflow’s case, is that Airflow users do not use the new resolver unless we force them to, and disabling the new resolver on Airflow would only keep them in the pre-20.3 era, and come back when things inevitably break again. How can this be prevented?

As I wrote above - it is just a matter of establishing close collaboration with the maintainers of the "excluded" packages. We can help you with testing the fixes and even maybe contribute some !

Airflow maintainers are already deeply vested into making PIP 20.3 works for us. The way our test harness works is that when we make it works for us - it will start working for our users, so no user involvement is needed.

As an example: We already have the issue opened apache/airflow#12838 that will remain opened until all problems are fixed. The first one identified in #9203 has already turned into PR: #9226 but already some of our maintainers (@kaxil) tested it and i turns out that there are more problems (which other reported as well so they will be eventually fixed).

If we have a new flag (--force-new-resolver or something) we can even make a scheduled run on our CI that will continue running and failing until the problems gets fixed - and then, when PIP with fixes gets released, we will ask you to remove the exclusion. Seems like a workable plan to me.

@potiuk
Copy link
Contributor Author

potiuk commented Dec 6, 2020

And undoubtedly this will be the same for other projects - in order to make it to the list - the maintainers of such package would have to come to you and ask for it - at the same time pledging their efforts to test and disable when the problem is fixed. That sounds like a win-win situation. Also such exclusion could have an expire date and it would require and active "please we need more time" with good justification from the maintainers to extend it.

@pfmoore
Copy link
Member

pfmoore commented Dec 6, 2020

One concern I have about this proposal is that I'm not actually sure how it would work in practice. Pip can't use different resolvers for different "parts" of an install, so we need to work out how to choose the resolver for any given install. And we can't know what packages are used in an install until we run the resolver (because of dependencies, some of which might not be possible to calculate until install time). So there's a chicken-and-egg issue here - if the user issues the command pip install XXX, should pip use the new resolver? If XXX changes to depend on airflow, should the answer to that question change? How do we know? If the user has to tell us, how is that different from the user disabling the new resolver manually, which they can do right now?

I'm not against a mechanism like this in principle, but in practice I have no idea how to design such a thing, and the level of detail in the proposal here leaves far too many questions unanswered. I'm concerned that if we try to work through those questions (like I've just done in the previous paragraph) you might see that as "pip's developers are continually blocking proposals to solve the issue" - it's not the intention, but it would be easy to get that feeling.

@uranusjr
Copy link
Member

uranusjr commented Dec 6, 2020

As I wrote above - it is just a matter of establishing close collaboration with the maintainers of the "excluded" packages. We can help you with testing the fixes and even maybe contribute some!

That’s what I’m trying to ask. We can only fix things if we know what to fix, and we need user reports to know what are not working well. “The resolver takes too long” is not directly actionable because it’s not caused by a logical error, but a lack of optimisation. We need to look into different issues and find a pattern, so we can optimise the algorithm to avoid/reduce the phenomenon. Once we create the exclusion rule, the reports would stop coming in, and you will need to find another way to generate test cases for the fix. Do you have a solution?

@potiuk
Copy link
Contributor Author

potiuk commented Dec 6, 2020

One concern I have about this proposal is that I'm not actually sure how it would work in practice. Pip can't use different resolvers for different "parts" of an install, so we need to work out how to choose the resolver for any given install. And we can't know what packages are used in an install until we run the resolver (because of dependencies, some of which might not be possible to calculate until install time). So there's a chicken-and-egg issue here - if the user issues the command pip install XXX, should pip use the new resolver? If XXX changes to depend on airflow, should the answer to that question change? How do we know? If the user has to tell us, how is that different from the user disabling the new resolver manually, which they can do right now?

Let's make it simple and straightforward, and limit it only to packages that are directly invoked by the user in PIP command. No dependencies. That would solve probably all issues of our users which otherwise would land in the "cannot install airflow" camp. Otherwise - if airflow is used as a dependency, it would still be installed with the new resolver, and you will get issues reported by people using airflow as dependency. Which is what you want I understand. Win-win again.

That’s what I’m trying to ask. We can only fix things if we know what to fix, and we need user reports to know what are not working well. “The resolver takes too long” is not directly actionable because it’s not caused by a logical error, but a lack of optimisation. We need to look into different issues and find a pattern, so we can optimise the algorithm to avoid/reduce the phenomenon. Once we create the exclusion rule, the reports would stop coming in, and you will need to find another way to generate test cases for the fix. Do you have a solution?

Yes. Our CI system. When you exclude airflow, and add capability to force the resolver, we pledge to run the tests continuously and let you know about the problems. Our CI is done in the way that it install the most complex combination of extras you can imagine and keeps the constraints consistent automatically. Also We can produce a number of "typical" combinations and run our CI with those. And we can have timeout for the test to test the "too long" case. And you could either subscribe to the result yourself or we will observe it and let you know when the problems are fixed.

And anyone else who you add to the list will have to do similar thing and make sure the tests are running for their packages.

@potiuk
Copy link
Contributor Author

potiuk commented Dec 6, 2020

And BTW - this is not a new concept by any means. Microsoft does it for years and provides different rules for applications installed on Windows OS. Linux kernel has numerous exclusions for different kinds of drivers and has some special tweaks. And here - you have maintainers of projects, ready to help you and cooperate. Let's break the wall "pip maintainers vs. projects that use PIP" and let's cooperate on this, empathetically understanding the other part needs and expectations, especially that it can cost very little if we limit the mechanism as described above.

@pradyunsg
Copy link
Member

pradyunsg commented Dec 6, 2020

At a high level: I'm a strong -1 to pip maintaining a "list of packages that need special casing". Such a list will never cover all the "special cases" out in the wild (eg: we're not going to hold a list of packages in $corp's internal index that can't be used with a certain feature in pip, same applies to packages on PyPI as well) [note 1]. pip doesn't handle numpy differently from $corp-cricital-package and I'd much prefer that we maintain that consistency.

From #9187 (comment):

I'm saying users need to change their workflow to adapt to a change made upstream toward being stricter. We'll work to address the slowness/takes-forever behavior here, but that doesn't mean it'll magically solve the underlying issue that's causing the problems for these users: that there are dependency conflicts in their dependency graph that are non-trivial to resolve.

Again, the underlying issue here is NOT that pip's using the new resolver -- that's either surfacing the fact that airflow has dependency conflicts in its dependency graph that are making the resolution process take longer than ideal, or that there's a bug in the resolver. If you use a lockfile to deterministically generate the dependency graph or provide a dependency specification that limits the extent of the resolution to something more reasonable, that'll address the underlying issue.

[note 1]: "make that list configurable" -- no, because then you might as well configure pip to use the behavior you want instead. :)

Microsoft does it for years and provides different rules for applications installed on Windows OS.

Microsoft is one of the largest corporations in the world, and likely hires more people for maintaining their HQ building, than we have devs volunteering to maintain pip. Similar resource level differences apply for the Linux Kernel analog. :)


So, I just went ahead and tried to understand what the issue is for the Airflow project. IIUC, you are already using constraint files to limit which versions are used with your application:

pip install "apache-airflow[postgres,google]==1.10.13" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.13/constraints-3.7.txt" 

This fails. I think I see the underlying bug you're hitting that's causing issues for you:

Collecting google-cloud-bigquery[bqstorage,pandas]<3.0.0dev,>=1.11.1
  Using cached google_cloud_bigquery-2.5.0-py2.py3-none-any.whl (210 kB)
  Using cached google_cloud_bigquery-2.4.0-py2.py3-none-any.whl (210 kB)
  Using cached google_cloud_bigquery-2.3.1-py2.py3-none-any.whl (208 kB)
  Using cached google_cloud_bigquery-2.2.0-py2.py3-none-any.whl (200 kB)
  Using cached google_cloud_bigquery-2.1.0-py2.py3-none-any.whl (190 kB)
  Using cached google_cloud_bigquery-2.0.0-py2.py3-none-any.whl (193 kB)

Despite the constraint file having google-cloud-bigquery==2.3.1. The extra-based requirement is not being properly constrainted by the constraint file.

PS: TIL pip allows remote URLs are requirements and constraints?! WTF.

@pradyunsg
Copy link
Member

The extra-based requirement is not being properly constrainted by the constraint file.

This is a bug in the new resolver, and I'd rather we fix that instead of adding mechanisms (as proposed in OP) to work around the issue.

@potiuk
Copy link
Contributor Author

potiuk commented Dec 6, 2020

Yes. Thanks for looking at it @pradyunsg.This was our original request in #9203. If you wait long enough the installation will crash - this is yet another problem we faced (PR to that is already created #9226 ).

If you can solve those before our planned releases - this is fantastic.

But we could not get information (@kaxil asked for it when we can expect this to be solved). Can you - at least provisionally give us indication if this is days or weeks, or months if you cannot tell a concrete date? My proposal is only valid, if you are not able to solve those issues in days to come. So please, any indication of how long it will take will be helpful.

BTW. I am happy to test any pre-release versions directly from github to test if the problem is solved. So let me know as soon as possible. I will even be happy to open another issue for that if you think it is necessary to split thsoe two isses (hopefully there are no more issues like that).

BTW. 2. We are using fully consistent constraints in 1.10.14rc (I worked on it specifically to test the new resolver before it was out) - the 1.10.13 had still dependency conflicts which I resolved while trying out the new resolver, as I suspected the conflicts could have been caused by that. This command tests all dependencies with fully consistent set of constraints:

UPDATE: I created the issue for the unnecessary downloading of the dependencies additionally to #9203 to keep those issues separated: Issue in #9232 .

Do you think the problems can be solved next week please?

@potiuk
Copy link
Contributor Author

potiuk commented Dec 6, 2020

This is a bug in the new resolver, and I'd rather we fix that instead of adding mechanisms (as proposed in OP) to work around the issue.

Please @pradyunsg let us know your feelings when we can expect fix to this error raised in #9233

@mik-laj
Copy link

mik-laj commented Dec 7, 2020

I would like to add some context from the Apache Airflow project perspective. @potiuk and others are a little more nervous lately, as we have a little hotter time in the project. We are working on a major release - Apache Airflow 2.0. This release is the result of over two years of work and we are a bit sad and upset now that users will have problems installing this release and these problems will spoil opinion on this release despite the fact that we aim to deliver the best quality product possible. To achieve this, we spend long hours on various changes to the project at work, but also in our spare time. Jarek tried for many weeks to make the dependencies in Airflow not cause conflicts and be compatible with the new resolver, but it was not an easy task, because Airflow has many dependencies and standard solutions very often do not work.

Therefore, we try our best to fix this problem, but unfortunately it is difficult as these problems are due to pip update. Jarek tries to discuss and present solutions based on his experience and knowledge, but I realize that they may not meet your requirements and may be contrary to your experiences with this project. Simply put: everyone has a different perspective and experience, and therefore different feelings.

From your perspective, these problems may not always be the most important because users have an easy workaround.
From our perspective, these problems are very important because users will have problems using the effect of our 2 years of work.

So I would like to ask for a bit of support and mutual understanding. What you would do in our situation? If this problem is fixable in the near future (~1-2 weeks) then we can wait to release the new version. If it is a longer period of time, is there any chance of a short-term workaround?

https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0+-+Planning

CC: @pradyunsg @uranusjr

PS. I heard that Jarek can't even sleep because he is worried about Airflow 2.0 installations on the latest pip. 🛌

@blaiseli
Copy link

blaiseli commented Dec 9, 2020

To support @mik-laj from a user perspective, it can be very frustrating to see a message from pip encouraging you to update pip, and then, after performing the update, have issues with package installs. Who is to blame for the incident will not be obvious for the user.

@brainwane
Copy link
Contributor

Hi, colleagues. I'm sorry for the wait you've faced in getting replies to your issues. (I'm writing this reply while multitasking, while attending a virtual conference that is taking up much of this week.)

Pip contributors are chipping away at multiple issues and I predict they'll be able to reply to several issues this week, including this one. I hope that will be able to help you make further decisions on your end.

Also (as Pradyun mentioned in #8936 (comment) ) we're working on making a 20.3.2 release this week. That fix will include at least one fix (#9241) to an issue that @potiuk reported.

I 100% agree and sympathize on how confusing it can be for a user to run into trouble and not understand why they're running into it and how to troubleshoot it. This is a problem for all software users and it's definitely a problem that users of Python packages have; it's been a problem for years, and it will continue to be a nonzero problem no matter what any one tool does, but developers and community leaders can mitigate it with better tools and better communication. Our user experience team is trying to gather information and improve our command-line output and our documentation to address several problems, including that one. If you could help by circulating this research signup form to your users that'll help us in the long run!

@potiuk
Copy link
Contributor Author

potiuk commented Dec 14, 2020

Lookin forward to the new 20.3.2 release, but in the meantime that we ideed start having issues which state that even even older (1.10.7 in this case) installations of Airflow do not work with the old constraints apache/airflow#13045 . Which makes me keep the fingers crossed even more.

@pradyunsg
Copy link
Member

Ahoy! pip 20.3.2 is out now. Based on my personal rudimentary testing (and @uranusjr's as well, I believe) -- the usecases and concerns flagged related to Airflow's installation are mostly/completely addressed now.

As I'd mentioned elsewhere, please do feel free to let us know if there are still outstanding concerns.

@pradyunsg
Copy link
Member

Coming back to this specific issue and proposal and thinking aloud...

I've stated that I'm a very strong -1 on this proposal and no one seems to have flagged any "i disagree" style comments to that yet. I'm inclined to close this, but am a little wary that it might be "too soon" on that front (since the conversation immediately deviated to be about something else) and I don't want it to look like me shutting off the current direction of the conversation. :)

@potiuk
Copy link
Contributor Author

potiuk commented Dec 15, 2020

I am testing the new PIP 20.3.2 today - we released Aiflow 2.0.0RC3 yesterday (after our round of testing and investigations of issues we had in RC2) and this is my task for today to test all the various scenarions with the new PIP. 🤞

@potiuk
Copy link
Contributor Author

potiuk commented Dec 15, 2020

I don't have very good news. We still cannot install airflow 2.0.0 with the new PIP in it's full "CI" configuration.

While some of the problems have been solved, it looks there are still some problems with installing airflow with a full set of dependencies. I am going to raise separate issues for some of those and maybe we can do something to limit the others (like pinning setuptools and wheels - or maybe removing them (I saw a discussion that adding setuptools in setup_requires might be not a good idea in general).

The installation log is attached. I am opening separate issues for some of the problems I see there.

pip-20-3-2-master.txt

Additionally - the whole process took ~ 30 minutes, where in pip 20.2.4 it takes around 20 seconds where you have preinstallaed dependencies. So this is a major regression.

@pradyunsg
Copy link
Member

That seems to hit #9284. Try #9293 (or 20.3.3, when I cut that release), since that's likely the regression you're seeing. :)

@pradyunsg
Copy link
Member

20.3.3 is out. :)

@potiuk
Copy link
Contributor Author

potiuk commented Dec 15, 2020

Unfortunately 1.10.14 has still some issues and there the problem is not easy to solve because one of the dependencies has bigger limits. We might want to solve it in 1.10.15 (if we release it)

@potiuk
Copy link
Contributor Author

potiuk commented Dec 15, 2020

After manual updating of the PyArrow and testing PIP 20.3.3 it seems that Airflow installs now for 2.0.0rc3 with even complex combinations of extras. This clears up the path of releasing 2.0.0 - hopefully the day after tomorrow - without worrying about our user's experience.

We still have a problematic extra (papermilll) that cannot be installed wit the new PyPI using the recommended method in 1.10.14 (current stable version) - but this is rarely used dependency and since we have a workaround (legacy resolver or PIP downgrading) this is not a high priority. We will likely release 1.10.15 in the coming weeks likely, and we will try to fix it then.

We lowered priority/description of the issue apache/airflow#12838 and we will close it once our users confirm that all works for them after we release 2.0.0. We will also add 20.3.+ to our CI pipeline in the coming days so that we can test it all automatically also for future upgrades to PIP.

Thanks for the fast reaction and resolution!

Feel free to close the issue if there are no other users who would like similar feature. From our point of view seems that we do not need such "exclusion list" any more.

@uranusjr
Copy link
Member

This clears up the path of releasing 2.0.0 - hopefully the day after tomorrow - without worrying about our user's experience.

Congradulations in advance! Thanks for the work and sorry for the last minute inconviniences we caused. I’ll keep my fingers crossed for y’all.

@pradyunsg
Copy link
Member

I'll go ahead and close this since OP doesn't need this anymore, and, I don't think we should go in this direction anyway. :)

@brainwane
Copy link
Contributor

And thanks everyone for working together, testing, and helping clarify and resolve this!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants