-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build dependencies doesn't use correct pinned version, installs numpy twice during build-time #9542
Comments
Please use numpy's |
i don't think you can point it to how the It's pip that's installing numpy twice (1.20.0 for building, and 1.19.5 as final version), so this can also happen with any other package combination in theory. It works fine if you install numpy FIRST, and then the package depending on numpy, as then pip recognizes that a compatible version is available, and doesn't install it again. If it wasn't with numpy but with another random package, you couldn't point to "oldest-supported-numpy" either. The build-dependency is specified as In short, it's pip that should resolve the build-dependency, detect that it's a dependency that's going to be installed anyway, and install numpy first (using this numpy installation for the build of the other package). |
pip builds in an isolated environment, so numpy isn't going "to be installed anyway" in the sense that you mean. You can use |
Even with build-isolation, it's at least downloaded twice (so it's a bug in pip) which wouldn't be necessary. Both 1.19.5 and 1.20.0 are perfectly valid numpy versions to satisfy the build-dependencies, so if i instruct pip to donwload 1.19.5 - why download 1.20.0 too (and on top of that, cause a potential build-compatibility issue alongside that). edit:
|
Neither of those suggestions work super cleanly and are actually more difficult to understand and explain than "isolated builds are isolated". As of today, you have 2 options: carefully pin the build dependencies, or tell pip to not do build isolation (i.e. you'll manage the build dependencies in the environment). Beyond that, I'm not excited by the idea of additional complexity in the dependency resolution process, that makes isolated builds depend on existing environment details -- both of your suggestions require adding additional complexity to the already NP-complete problem of dependency resolution, and that code is already complex enough. And, they're "solutions" operating with incomplete information which will certainly miss certain usecases (eg: custom compiled package that wasn't installed via a wheel). At the end of the day, pip isn't going to be solving every use case perfectly, and this is one of those imperfect cases at the moment. For now, that means additional work on the user's side, and I'm fine with that because we don't have a good way to have the user communicate the complete complexity of build dependencies to pip. |
I had the same issue today. Since the release of numpy 1.20.0 yesterday, there is a new dimension to this problem. For instance, I (mostly my users and CI services) usually install the package dclab with
dclab comes with a few cython extensions that need to be built during installation, which is a perfectly normal use-case. This is not one of those imperfect cases. Now, the problem is that during installation of dclab, pip downloads numpy 1.20.0 and builds the extensions. But in the environment
As far as I can see, I have only three choices:
The best solution to this problem, as far as I can see, would be for pip to be smart about choosing which version of the build dependency in pyproject.toml to install:
I know that pinning versions is not good, but tensorflow is doing it apparently, and many people use tensorflow. [EDIT: found out that oldest-supported-numpy works for me] |
Which Version of numpy does that install? While it may not cause a problem in this constellation, it might cause a problem once your environment updates to numpy 1.20.0 (which apparently changed the ndarray size, while prior versions didn't). |
For me it installs numpy 1.17.3; the oldest-supported-numpy package on PyPI states:
I just checked with a
|
Otherwise due to a pip-bug, at build time, the latest numpy is installed, which can break if the environment uses a older pinned version Explanation: pypa/pip#9542 (comment)
Here's another test case to help clarify the issue. Steps to reproduceThis problem happens on Linux, in Docker for Mac, but not on Mac outside of Docker.
Result
Notes
Workaround 1: Update to Workaround 1: Update the project to |
Structurally this needs better tooling as either the build time numpy needs to be pinned low, or the build process needs to generate wheels with updated requirements However none of the tools is something in pip, this is a topic for numpy, setuptools and the build backends as far as I can tell |
Interesting. There's a leaky abstraction or two in there somewhere. A Dockerfile aims to be a repeatable build but these steps inside it:
now quietly build a broken Python environment just because
|
Yes, precisely that. There's no reason to assume that if a user runs
Why should pip assume that's the right thing to do? It might be for numpy, but we don't want to special-case numpy here, and there's no reason why it would be true in the general case. You might need to build with a particular version of setuptools, but have a runtime dependency on any version, because all you need at runtime is some simple function from pkg_resources.
Honestly, I have no idea. (I don't know without checking the source whether It's quite possible that there are additional bits of metadata, or additional mechanisms, that would make it easier to specify cases like this. But designing something like that is hard, and most people who need that sort of thing are extremely focused on their particular use cases, and don't have a good feel for the more general case (nor should they, it's not relevant to them). So it's hard to find anyone with both the motivation and the knowledge to look at the problem. Which is also why non-pip domain specific solutions like the "oldest supported numpy" thing are probably a better approach... |
I agree with pfmoore in that I don't think numpy should be special-cased. If you want to have reproducible builds with pip, it looks like you need to define two files:
If you're using numpy both to build extensions and at runtime, you'll want to specify the exact same version of numpy in both. I wasn't too familiar with pyproject.toml before being bitten by this bug, but the following blog post does a good job of explaining the rationale: https://snarky.ca/what-the-heck-is-pyproject-toml/ |
Indeed, I was starting to wonder if
I'm not saying pip should special-case numpy, just that the combination of tools is now failing subtly, fragile to a new release of one library in builds that tried to freeze all library versions, and this is probably puzzling lots of developers after they rebuild Python environments. ( |
Here's a similar, but slightly different case:
The above scenario produces the same results where the pinned version of |
So as a followup to my above comment, is there a way as a consumer of package B that I do not maintain but do depend on to control what version of a build dependency gets used by |
Fixes incompatible numpy versions build vs runtime, as NumPy v1.20 is binary incompatible with older versions. See pypa/pip#9542
There hasn't been much discussion in this issue lately, but for future reference I want to add that this is not only an issue for numpy and its ecosystem of dependent packages, but also for other packages. In helmholtz-analytics/mpi4torch#7 we face a similar issue with pytorch and I don't think that the purported solution of creating a meta package like To be fair, pip's behavior probably is fully PEP517/518 compliant, since these PEPs only specify "minimal build dependencies" and how to proceed with building a single package. What we are asking for is more: We want pip to install "minimal build dependencies compatible with the to-be installed set of other packages". This got me thinking that given pip calls itself to install the build dependencies in build_env.py, couldn't one add sth. like "weak constraints" (weak in the sense that build dependencies according to PEP517/518 always take precedence) that contain the selected version-specified set of the other to-be-installed packages? However, and that is probably where the snake bites its tail, the build environments AFAIK already need to be prepared for potential candidates of to-be-installed packages? As such we would not have the final set of packages available, and even for simply iterating over candidate sets one cannot only anticipate that this can become expensive, but there are probably some nasty corner cases. @pradyunsg Is this the issue you are refering to in your comment? If so, do you have an idea on how to fix this? |
I'm wondering if some sort of |
FWIW, it’s already possible to use |
Precisely. When building a package, pip does not know what exact set of dependencies it will end up with because it has not seen the entire set of dependencies yet. For how to "fix" this on pip's end -- it's not something that pip has enough information to fix. We do have one mechanism to provide this additional information to pip though, and that's what the To provide a concrete example... I'll use this usecase:
This will install the pinned versions and use them for the build as well.
Well, for the usecase here, I reckon that passing through the |
There is a significant issue here with dependencies with ABI constraints, and with NumPy in particular because it's so widely used (and because its runtime deps will be wrong if you build against a too-new version of numpy). As the maintainer of most NumPy build & packaging tools and docs, let me try to give an answer. Regarding what to do right now if you run into this problem:
There will still be an issue here that it's possible to trigger build isolation, end up with a wheel built against numpy To work towards better solutions for this issue: For more context on depending on packages with an ABI (uses NumPy and PyTorch as qualitatively different examples), see https://pypackaging-native.github.io/key-issues/abi/. The answer is not to do anything in Once the build backend specific solutions have been established, it may make sense to look at standardizing that solution so it can be expressed in a single way in the The |
Thx for the additional pointers. Having a resource like However I disagree with one of your conclusions:
setuptools is flexible enough to add this runtime dependency to the generated wheel (which I already use). Hence just uploading sdist to PYPI and having the buildsystem pin the runtime requirement of the produced wheel, is not a solution IMHO. I agree that the |
Fair enough - when you run custom code in It's also fair to say I think that when the runtime dependencies are correct, then
That's perhaps another angle of looking at this indeed - if
That's not quite the reason as remember it. If it's unused, it also wouldn't do any harm. The kind of thing this issue is about - building from source on an end user machine - works better without build isolation. Build isolation was chosen as the default to make builds more repeatable, in particular for building wheels to upload to PyPI. Which was a valid choice - but it came at the cost of introducing issues like this one. When a user has |
I'd strongly disagree. I don't want building a package (as part of Maybe you meant "on a developer machine"? Or maybe you meand "works better with build isolation"? Or maybe you're assuming that no end user ever needs to install a package that's available only in sdist form? |
@pfmoore no typo, this really does work better without build isolation. Build isolation is a tradeoff, some things get better, some things get worse. Dealing with numpy-like ABI constraints is certainly worse (as this issue shows). There are other cases, for example when using No worries, I am not planning to propose any changes to how things work today. You just have to be aware that it's not clearcut and there are some conceptual issues with the current design.
On the contrary - I do it all the time, and so do the many users whose bug reports on NumPy and SciPy I deal with. |
@rgommers OK, fair enough. But I still think it's right for build isolation to be the default, and where non-isolated builds are better, people should opt in. That's the comment I was uncomfortable with. I agree 100% that we need better protection for users who don't have the knowledge to set things up for non-isolated builds so that they don't get dumped with a complex build they weren't expecting. But I don't want anyone to have to install setuptools before they can install some simple pure-python package that the author just hasn't uploaded a wheel for. |
I disagree with that. I often do editable installs in production environments (in Dockerfiles, for instance), or in CI jobs, and I don't want the build dependencies in the runtime environment. So there are tradeoffs there too. |
This commit provides a simple test that demonstrates the issues a resolver-unaware build isolation imposes on packages with C/C++ ABI dependencies. Cf. pypa#9542 for the corresponding discussion.
Ok, now I understand what you meant. Sry I was getting ahead of myself there, and I fully agree with you that many people/packages facing this issue need to pin the version in the built wheel files. This is certainly sth. which is a non-issue for pip, but rather needs fixing in the build backends as you suggested e.g. for However, and this is the point in my opinion: Even when people have fixed their packages or build backends, the issue persists. It is no longer a runtime issue, as the original reporter in this thread experienced, it becomes an install-time issue. And this might very well be sth. pip could (and maybe even should) address. To highlight the install-time issue I created a draft PR #11778 that adds a (so far failing) test to the pip test collection. Maybe sb. has a good idea on how to proceed from there. Regarding your idea about the metadata, that is probably the big question in my opinion: Is it possible to fix this issue, maybe by implementing a good-enough heuristic in pip, that works for most cases, or does it need additional metadata to find feasible solutions. |
To be honest, I've lost the thread of what's going on here. And a PR including just a test that claims to demonstrate "the problem", without clearly explaining what the problem is in isolation (i.e., without expecting the reader to have followed this whole discussion) isn't of much help here. If someone can add a comment to the PR describing a way to reproduce the issue it's trying to demonstrate, in terms of how to manually write a package that shows the problem, with step by step explanations, that would help a lot. I tried and failed to reverse engineer the logic of the test (the |
No additional metadata is needed I believe. Right now, this example from the issue description:
should error out if the runtime dependencies are correct in the There is no way to "fix" this in |
Has anyone mentioned #4582? It already discussed the same topic in considerable depth, and many people responded above are involved there. |
Environment
Description
Using pyproject.toml build-dependencies installs the latest version of a library, even if the same pip command installs a fixed version.
in very some cases (binary compilation) this can lead to errors like the below when trying to import the dependency.
Expected behavior
Build process should use the pinned version of numpy (1.19.5) instead of the latest version (1.20.0 at time of writing). This way, the installation process will be coherent, and problems like this are not possible.
How to Reproduce
Output
In verbose mode, the installation of numpy 1.20.0 can be observed, however, even with "-v", the output is VERY verbose.
An attached version can be found below (created with
pip install --no-cache numpy==1.19.5 py_find_1st==1.1.4 -v &> numpy_install.txt
).numpy_install.txt
The text was updated successfully, but these errors were encountered: