Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip silently downgrades dependent modules #10807

Open
1 task done
cruiseliu opened this issue Jan 19, 2022 · 27 comments
Open
1 task done

pip silently downgrades dependent modules #10807

cruiseliu opened this issue Jan 19, 2022 · 27 comments
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@cruiseliu
Copy link

Description

Exactly the same as issue #5068 . But it's locked so I have to create a new one.

I know pip has a dependency resolver now, but in many cases it does not help.

  1. pip only tracks Python scripts that are packed and installed as libraries, but many people have local scripts outside site-packages. pip can silently break them with no warning.

  2. In our case a sdist package requires numpy with no version constraint, but needs rebuild when numpy ABI changes. One of our indirect dependency requires an old version of numpy and pip silently downgraded it, making the sdist package broken.

In the original issue pip dev mentioned upgrade-strategy, but seems it has nothing to do with above scenarios.

Because our project is a published general library, we cannot do tricks like pip freeze.

Expected behavior

No response

pip version

21.3.1

Python version

3.9

OS

linux

How to Reproduce

  1. Install source package of ConfigSpace.
  2. Install a package that requires numpy < 1.22. (In fact I don't know which package it is. pip says "Collecting numpy<1.22" but does not tell me why.)
  3. Import ConfigSpace. It raises: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Output

No response

Code of Conduct

@cruiseliu cruiseliu added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Jan 19, 2022
@burnpanck
Copy link

A related issue arises when using pip together with other package managers such as conda/mamba. pip might not be aware of some these other sources. For those cases, it would be great if pip could be configured to not to touch packages which are already installed, and fail when the dependency requirements cannot be fulfilled this way. With that enabled, any conflict with packages installed outside of pip will have to be resolved manually/by other means, but it (almost) guarantees that pip will not break anything existing. (with the exception of packages that break through the very existence of certain new packages).

@pradyunsg
Copy link
Member

For those cases, it would be great if pip could be configured to not to touch packages which are already installed,

You're asking for PEP 668's protections, which is under ongoing discussions. :)

https://www.python.org/dev/peps/pep-0668/

@burnpanck
Copy link

I wasn't aware of PEP 668, but that definitely looks related. My impression from a first skim however is that it focuses on protection of distro-provided packages, not among competing package managers within a virtual environment. In particular, the use-case table lists on row 6 that "Deleting externally-installed packages permitted" will "stays yes" for conda environments. Maybe I am misunderstanding the PEP though.

The specific case I have is indeed an environment managed by conda, because of some binary dependencies not available easily through other means (to run tensorflow on mac M1). Some conda packages compiled against numpy 1.19. Pip tried to replace that on several occasions with it's own numpy 1.22, breaking the conda environment. Given that conda is aware of dependencies installed through pip while pip ignores requirements managed by conda, I would prefer if pip would only proceed doing anything, if it can do so without touching conda packages. (That said, I would have preferred if I didn't need conda in the first place and could do all dependencies with pip only).

@samfux84
Copy link

samfux84 commented May 30, 2022

I am managing many Python installations on a HPC cluster. It is a real pain to work with pip as it often breaks existing package installations by down- or upgrading packages without asking.

I fully understand that some package installations won't work if for some dependencies different versions are required. But then pip should print an error message a list the conflicting dependencies and not just go berserk and start deleting installed packages. This is clearly a no-go. I wouldn't mind if users can enable this behavior with a pip option on the command line or set it as their preference, but it should never be the default.

Whenever I install something with pip, I need to look at the screen all the time to make sure I am fast enough to press ctrl+c whenever pip starts to uninstall packages without asking. This is very annoying when you consider that an average Python installation on our cluster can easily contain 400-500 packages ...

Further more, when you install many packages at once, then the worst case is when a transient dependency causes pip to go berserk. Then it takes a lot of try and error to even figure out which package causes the problem.

@notatallshaw
Copy link
Member

notatallshaw commented May 30, 2022

I am managing many Python installations on a HPC cluster. It is a real pain to work with pip as it often breaks existing package installations by down- or upgrading packages without asking.

There are many other tools such as pip-tools and poetry and pipenv for managing environments, you may want to look in to them. Pip currently doesn't have any functionality to support making constraints automatically from existing installs.

Or you could do it yourself, as part of your install process you could generate something like this:

import importlib.metadata

for dist in importlib.metadata.distributions():
    print(f"{dist.metadata['name']}>={dist.version}")

You can then create or append to a constraints.txt and then pass it to pip with the argument -c constraints.txt: https://pip.pypa.io/en/stable/user_guide/#constraints-files

berserk and start deleting installed packages. This is clearly a no-go. I wouldn't mind if users can enable this behavior with a pip option on the command line or set it as their preference, but it should never be the default.

Pip doesn't "delete" packages in the sense of removing them when you run an install command, it can replace them with a different version, either upgrading or downgrading,

This is very annoying when you consider that an average Python installation on our cluster can easily contain 400-500 packages ...

Further more, when you install many packages at once, then the worst case is when a transient dependency causes pip to go berserk. Then it takes a lot of try and error to even figure out which package causes the problem.

Finding an install solution in a large dependency graph is an exponentially hard problem. I believe once you start getting to the size of ~400+ packages you either need to go beyond relying just on pip to manage your environment (like on the example I give above) or there needs to be further improvements in the backtracking techniques that Pip uses.

With regards to backtracking techniques I believe what would help a lot is backjumping but I haven't had time or motivation to explore, but feel free to take a look at my thoughts here if you're interested in contributing: sarugaku/resolvelib#64 (comment)

@samfux84
Copy link

@notatallshaw : Thank you for your reply.

I am not asking for a solution to manage environments and I don't expect pip to do this at all.

The only thing I would wish for is that pip does not delete existing installations (replacing/upgrading/downgrading usually starts with deleting an existing installation) without asking. It is really as simple as this.

Pip should have an option to enable that gives the user the possibility to avoid unwanted deletions of existing installations. It would be nice if this option causes pip to abort and print an error about the conflicting versions of dependency packages.

Installing 400+ packages without a software to manage environments is not a problem at all and working perfectly fine. It would just be a huge improvement if I could run some "pip install --no-binary :all:" with a list of packages (batches of 20 or so) that could run and then work on something else, come back later and either get the result that the packages are installed or that there was a conflict and they are not installed but I have some logs to check what the conflict was and work on resolving it.

But with the current behaviour of pip I can't run it and then do something else, because when I get back to it usually things are broken already. To give you an example:

I build numpy and scipy such that they are linked against OpenBLAS:

OPENBLAS=$OPENBLAS_ROOT/lib/libopenblas.so pip install --no-binary :all: numpy scipy

this takes several minutes and therefore I would like to not have to repeat this all the time. Therefore if there is a conflict with the installed numpy version and I notice this a bit too late, then pip removes my numpy installation and does whatever it thinks may be appropriate without asking. Then I need to

  • remove the package that caused the conflict
  • uninstall the numpy installation that was replacing the one I did before
  • build numpy again the way I want it to have installed

I would be very happy if pip would just abort and tell me that there is a conflict. Then I can take care myself of the problem without having to clean up after pip all the time.

@pradyunsg
Copy link
Member

@samfux84 see #9094.

@notatallshaw
Copy link
Member

notatallshaw commented May 30, 2022

The only thing I would wish for is that pip does not delete existing installations (replacing/upgrading/downgrading usually starts with deleting an existing installation) without asking. It is really as simple as this.

As I understand this you're asking for an option where Pip should only install new packages or fails when it finds a solution that needs to upgrade or downgrade any existing packages? I would personally not expect a general desire for this behavior, but perhaps make an issue/PR for this and see what Pip maintainers think.

You can manually implement this behavior modifying what I did above by using a constraint file that includes the following (you may need to eyeball the output to make sure the metadata is actually accurate):

import importlib.metadata

for dist in importlib.metadata.distributions():
    print(f"{dist.metadata['name']}==={dist.version}")

But with the current behaviour of pip I can't run it and then do something else, because when I get back to it usually things are broken already. To give you an example:

I build numpy and scipy such that they are linked against OpenBLAS:

OPENBLAS=$OPENBLAS_ROOT/lib/libopenblas.so pip install --no-binary :all: numpy scipy

this takes several minutes and therefore I would like to not have to repeat this all the time.

A future solution to this is also that Pip won't need to build (or download) numpy to extract the metadata and Pypi will provide it in a way that Pip can consume and resolve the dependencies before downloading and building the package (related to PEP 440 I think? But I haven't been following the discussion exactly)

@samfux84
Copy link

As I understand this you're asking for an option where Pip should only install new packages or fails when it finds a solution that needs to upgrade or downgrade any existing packages? I would personally not expect a general desire for this behavior, but perhaps make an issue/PR for this and see what Pip maintainers think.

A simple example. I just installed Python 3.10.4 and numpy 1.22.4. Now I want to install the newest numba version. But there is a conflict, as the most recent numba version does not support numpy 1.22.4 yet and needs an older version.

Then I would expect that when I run for instance

pip install numba

that pip detects the conflict with the already installed numpy version an gives me the choice to either confirm that I am ok with replacing my numpy installation or to say no and to terminate the installation. Then I still have the choice to not install numba at all or to wait until a numba version is released that supports numpy 1.22.4. But with the current implementation I need to either press ctrl+c fast enough (before pip uninstalls numpy) or clean up afterwards and undo the unwanted changes and redo my previous installations to get back to the same state as before running "pip install numba"

@notatallshaw
Copy link
Member

notatallshaw commented May 30, 2022

I understand that example but I think there are a lot of subtleties to exactly how you implement such an option that impact the use cases it has. For example should pip fail is numba requires a newer version of numpy? Should pip fail is numba requires a more specific version of numpy (e.g. 1.22.4-rev1 instead of 1.22.4)? Should pip fail if numba requires going from a prerelease version to a non-prerelease version that otherwise has the same version number? etc.

Given your example using the constraints file approach I have explained should work fine, only a version of numba that is compatible with your existing environment will be installed. Pip would check all versions of numba though, so I would recommend putting a lower bound on it.

@samfux84
Copy link

samfux84 commented May 30, 2022

Personally I would appreciate if pip let's me know if a package can be installed without deleting anything or not and then give me the choice to say either yes or no.

If numba requires an older version of numpy -> ask me if I am ok with this and continue or stop depending on my answer
If numba requires a newer version of numpy -> ask me if I am ok with this and continue or stop depending on my answer
If numba can be installed with the numpy version I have -> install it

pip install numba

will try to install the most recent version of numba. If the most recent numba version needs a newer version of numpy, then I can still try to install an older version of numba that maybe does not conflict with the numpy installation that I already have by specifying an explicit version of numba to install.

pip already evaluates if replacing of a dependency package is required or not. So it is just about asking the user before deleting files and giving a choice to stop the process in case deleting files would be required for an installation.

@samfux84
Copy link

samfux84 commented May 30, 2022

If I uninstall a package (for example docopt), then pip is explicitly asking me if I am ok with deleting the files:

$ pip uninstall docopt
Found existing installation: docopt 0.6.2
Uninstalling docopt-0.6.2:
  Would remove:
    /cluster/apps/nss/gcc-8.2.0/python/3.10.4/x86_64/lib64/python3.10/site-packages/docopt-0.6.2.dist-info/*
    /cluster/apps/nss/gcc-8.2.0/python/3.10.4/x86_64/lib64/python3.10/site-packages/docopt.py
Proceed (Y/n)?

But If I run

pip install numba

Then pip will delete my numpy installation without asking. So eventually a solution would be that

pip install numba

when finding a conflict with my numpy installation would try to run

pip uninstall numpy

which would then ask me if I am ok with deleting it or not. The I would have a choice to say no if I am not ok with my numpy installation being deleted.

@notatallshaw
Copy link
Member

notatallshaw commented May 30, 2022

Uninstall removes the package all together which may break many dependencies.

Install sometimes finds it needs to install a different version to maintain dependency compatibility, these are two very different operations.

Further it also may need to do with an arbitrarily large number of installs, requiring 100s of prompts, also adding a user prompt by default would at this point break many CI tools.

@samfux84
Copy link

samfux84 commented May 30, 2022

If you have a package installed that needs numpy<=1.20.3 and then you install a package that requires numpy >=1.21.0 and pip replaces numpy with a version >=1.21.0, then this will also break compatibility with the first package that requires numpy <=1.20.3

So the only way to not break compatibility at all would be to ask the user before making a change ;-)

This is basically why I press ctrl+c when pip tries to delete packages as I want to keep compatibility of the packages that are already installed.

I will now stop bothering you with my questions and not waste more of your time. Thank you very much for answering my questions and for the good work developing pip. This is appreciated a lot.

@samfux84
Copy link

I can not give an explicit example right now, but today and tomorrow I will install around 400-500 packages for my new Python 3.10.4 installation. I'll try to find a case where there is such a conflict and see how pip handles it.

Again thank you for replying to my questions.

@notatallshaw
Copy link
Member

Sorry I initially misunderstood your latest post and deleted my comment. I think you mean when you get the following error?

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behavior is the source of the following dependency conflicts.

This is a subtly different issue then pip installing a different version of a package and not producing a warning, as a warning here is produced. This has already been refereed to in this thread: #10807 (comment)

@burnpanck
Copy link

burnpanck commented May 30, 2022

I believe that the linked #9094 is related in that it is a case of pip breaking other packages during an update, and it seems a majority of the users agree that pip should not do that when it detects such a case. However, the request here is a slightly different one: give the user the opportunity to instruct pip not to change any existing installation, even if pip does not detect any compatibility issue (e.g. because the requirements specification isn't accurate to that detail).

I have observed similar behaviour as @samfux84; my manually compiled numpy 1.22 gets downgraded by pip when I ask pip to install numba, because of an upper limit in their dependency. Pip does not detect any problem with that, because I might have nothing installed which requires numpy 1.22 in general. However, if I did compile something else in the meantime which did link against the numpy API it found when it was built, it still breaks during the downgrade.

Numpy is of course a rather complex package in that it is both a python dependency, but also a binary dependency to many scientific packages. But this is in fact a common situation for scientific packages, and the reason why the python community has diverged between pip and conda (and others) in the first place. It is great that the python community is actively trying to improve compatibility between these different requirements. Nonetheless, a "safety switch" in the form of a configurable option preventing pip from changing anything installed would still enable many options for unusual requirements, because it gives a maintainer of an environment the possibility to decide rather than just silently breaking things.

@pfmoore
Copy link
Member

pfmoore commented May 30, 2022

This discussion has got way off topic, I believe. There seem to be three separate points being discussed here:

  1. @cruiseliu's original point about two issues (scripts, and sources that need recompiling when numpy is upgraded).
  2. @burnpanck's point about packages that are not managed by pip.
  3. @samfux84's point about pip downgrading numpy to satisfy constraints.

I'd strongly suggest these should be separate issues.

Having said that,

  1. The original issue doesn't appear to have nuch in the way of actionable suggestions, so I'm not sure what we should do here.
  2. The non-pip-managed packages issue is being addressed by PEP 668, as already noted. I'm not sure anything further needs to be said here.
  3. The numpy downgrade issue may suggest that a PR with a new --upgrade-strategy value of "never" (abort if any package that is already installed needs to change) might be useful. But regardless, it's a separate issue and should be raised as such.

@samfux84
Copy link

@pfmoore Thank you for your reply. I am looking for exactly the same as @burnpanck, a safety switch to tell pip to not change any existing installation.

But I agree that this is going off topic, so I will stop posting.

@maparent
Copy link

Sorry for a late comment, I recently got frustrated with this as well, and I thought I'd toss in a few ideas (absent a new issue for samfux84's request... should I create one?)

  1. I am trying to train myself to always install with --dry-run first, so I can see upgrade/downgrades before they happen. I'd like a way to make that the default in my pip.conf, and to cancel it explicitly on the CLI.
  2. I totally second using new variants of the --upgrade-strategy flag. I am less sure that never is the most useful new setting. I would see the following as useful cases:
    1. abort-if-downgrade (implicitly allows all upgrades)
    2. abort-if-major-upgrade (for packages that follow semantic versioning)
    3. abort-if-minor-upgrade (idem)
    4. abort-if-upgrade (which would be equivalent to never, but clearer imho)
    5. abort-if-incompatibility (which would actually scan all the other installed packages, and would be expected to be slow)

Ideally, it should be possible to combine with other upgrade-strategy flags, such as eager vs only-if-needed, but that may be asking for a lot. (Combinations would also allow someone to choose to abort on major upgrades, but not minor downgrades, etc.)
Otherwise, I think that most people would be happy with a default of abort-if-major-upgrade, which would also abort on downgrades, and they could locally set the strategy to only-if-needed after having checked the consequences.

@notatallshaw
Copy link
Member

notatallshaw commented Jan 12, 2023

Sorry for a late comment, I recently got frustrated with this as well

Have you tried using a constraint file like I mention previously?

E.g. here is a simple versions to not affect existing installed packages, e.g.

pip freeze > existing-env.txt
pip install my_new_requirement -c existing-env.txt

And here is another versions where you want no existing package to be downgraded (in Unix-like environment adapt to whatever shell/utility you run Pip in):

pip freeze | sed 's/==/>=/g' > existing-env-lower-bounds.txt
pip install my_new_requirement -c existing-env-lower-bounds.txt

Until someone actually makes a PR for Pip that can be evaluated by the maintainers I've found this solution very easy, robust, and customizable.

@maparent
Copy link

Thank you for the answer. The lower-bound variant you added indeed answers my use case; but I note that it requires maintenance by hand after every update. I think this is a behaviour that a lot of people think of as desirable, and it should be an option that could be set in pip.conf and would apply to every invocation without extra work. That said, point taken about the PR.

@Arcitec
Copy link

Arcitec commented Jan 21, 2024

@notatallshaw Thanks for a great solution to this super frustrating problem.

I have an AI environment where various modules have their own requirements.txt files. Some modules are outdated and contain frozen requirements which specify somepackage==some.ancient.version. And whenever pip install -r encounters those files, it happily downgrades to the old version.

Your 2nd solution is perfect. Freezing our currently installed packages, converting the == to >= to allow upgrades but never downgrades, and using that as the extra constraints, completely solves the problem!

I agree with @maparent that it would be better if this was built into pip itself via the --upgrade-strategy ideas he proposed.

@notatallshaw
Copy link
Member

notatallshaw commented Jan 21, 2024

I agree with @maparent that it would be better if this was built into pip itself via the --upgrade-strategy ideas he proposed.

Perhaps this is small enough to be added to Pip if someone provides a well thought out PR that is concise enough for Pip maintainers to review and merge.

But it's actually part of a larger problem which is package and environment management, i.e. rather than just taking into account what packages you have installed it would be an even better workflow to consider what packages you chose to be installed. Then, for example, when you added something new your previous package requirements were taken into a account but some transitive dependency was not.

This kind of workflow is not likely to come to Pip (https://discuss.python.org/t/why-doesnt-pip-write-installed-packages-to-pyproject-toml/43657/9), but there are other tools such as Poetry, PDM, hatch, or Pixi which are approach this problem (in various different ways), and if you need is great enough with complex different requirements I would suggest you look at those.

That said, I personally continue to use contraints files and a couple of small bash scripts, as it is fairly simple and flexible for my needs.

@potiuk
Copy link
Contributor

potiuk commented Jan 21, 2024

I would just add to what @notatallshaw wrote (the solution with constraints is really good), that you seem to be looking for generic solution in pip while possibly, what you really need is a specific command for your environment to prevent selected packages from being downgraded or upgraded.

For example what you were explaiing before with numba @samfux84 , i guess you just have numpy that is really problematic and important for your installation.

What is important for you might not be important for others to keep, and you seem to already know that keeping numpy speciifically at a given version is important for you. But it's a property of your specific enviroment IMHO. It's not applicable to generic space of "resolving dependencies" - it's a problem of your environment and others might have different strategies there.

Maybe few more packages that you know are problematic for you, but my guess there are just few of those and you know which ones you want to keep. So why don't you just explicitly add numpy==1.22.4 (and other important ones) as part of your pip install command?

It's largely the same as what @notatallshaw is using but only limits those packages that you really need to keep in specific versions. That feature already exists in pip - if you specify a package in a version that you already have, it wil be kept and it will limit resolver to only use that version when looking for set of non-conflicting packages.

So you can do for example (because you already know that you compiled numpy and want to keep it):

pip install --upgrade numba numpy==1.22.4

And add numpy==1.22.4 to all your pip install commands - precisely because this is what you care about in your env. Properly automating it, and you can even set the version you need in an env variable and use it everywhere.

This will find the right version of numba that is non-conflicting with your locally installed numpy and it will either install it without touching numpy or fail the command if it cannot find the right resolution.

This also has the nice advantage, that resolution might be faster (pinning dependencies limit the space of potential resolutions) and also not constraining or '>=' everything (like with @notatallshaw solution) allows you to find more solutions (for example if downgrading another package that you do not care about being pinned or ">=" pip might find a good resolution there by downgrading them.

This is what - for example - we advise our users to do when they want to upgrade some packages that are installed together with Airflow. While we recommend installing airflow using constraints that we automatically prepare for them (which gives reproducible installation and immunity to breaking changes introduced by 3rd-party packages suddenly and without warning), it also allows our users to upgrade to newer version of dependencies as they see fit (without the need of accidentally downgrading or upgrading airflow itself):

From: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

Typically, you can add other dependencies and providers as separate command after the reproducible installation - this way you can upgrade or downgrade the dependencies as you see fit, without limiting them to constraints. Good practice for those is to extend such pip install command with the apache-airflow pinned to the version you have already installed to make sure it is not accidentally upgraded or downgraded by pip.

pip install "apache-airflow==2.8.1" apache-airflow-providers-google==10.1.0

@Arcitec
Copy link

Arcitec commented Jan 21, 2024

@potiuk Thanks for the extra insights and tips. Your suggestion of only "pinning" specific packages makes sense, since it's often just a few packages that are problematic. Allowing downgrades of less important packages could often help to satisfy the dependency resolution of what truly matters.

I'm curious about your method of specifying "pinned" versions directly on the command line. I wonder what pip would do with pip install -r requirements.txt numpy==1.22.4 if requirements.txt itself contains a line like numpy==2.*? Perhaps it's better to use the -c constraints.txt flag to let pip know that our numpy==1.22.4 is a constraint, not an installation request. Your suggestion of only listing specific packages would work well in the constraints file.

Anyway, I still think it would be nice if pip had a flag to "never downgrade installed packages". Thus exiting with a warning if there's a downgrade. That is basically what I achieved via the "pip freeze and convert == to >=" trick, and it means that pip aborts if it cannot resolve the dependency. That has in turn alerted me to some ancient libraries that specify exact requirement on outdated versions even though they work fine with the latest version too. It was a nice sanity check for my dependency tree.

@pfmoore
Copy link
Member

pfmoore commented Jan 21, 2024

Anyway, I still think it would be nice if pip had a flag to "never downgrade installed packages". Thus exiting with a warning if there's a downgrade.

That sounds like a reasonable suggestion. And as @notatallshaw said:

Perhaps this is small enough to be added to Pip if someone provides a well thought out PR that is concise enough for Pip maintainers to review and merge.

It's unlikely that this will be a priority for the maintainers in the foreseeable future, though, so any work on this will almost certainly have to come from a community member.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

9 participants