-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poetry package install time seems longer than installing with pip #338
Comments
Thanks for your interest in Poetry! This is expected because Poetry orders the packages to install so that the deepest packages in the dependency graph are installed first to avoid errors at installation time. This requires sequential installation of the packages which takes longer but is more "secure". Also, Poetry checks hashes of installed packages for security reasons and due to the way |
Thanks for taking the time to explain :) Would it be possible for poetry to just download the target packages in parallel and install in sequence to speed up the process? |
I intend to improve the installation part of Poetry sometime in the future to speed things up, yes. I can't give you an ETA though, the only thing I can tell you is that it will likely be after the I'll be sure to keep you posted if anything changes on that front. |
That sounds great @sdispater. Of course I fully understand your reasoning and agree that ensuring stability is a lot more important than performance. Thanks for the great work on Poetry! |
Let me start by saying I recently discovered Now, is there any progress on this? I just migrated a couple of projects to poetry and my container builds went from 2s to 5 minutes each. The reason why it's so slow is twofold:
so I could cache that layer. With
Which means I need to install the dependencies over and over even though I haven't changed a single dependency. I am trying to do something equivalent with:
But I am not sure if this is the right thing to do but does the job. Feels extremely hacky so I'd appreciate some advice :P
|
Re this previous comment, it would be ideal if there was an interface to install when only the lock file is available, so that we can continue to make good use of docker layer caching. Because I haven't been able to get an install to succeed when the source isn't also available. |
Just did a comparison for out 190 package requirement project (according to poetry) and the time difference between pipenv en poetry was the following:
Now yes, pipenv probably doesn't take installation order into account and just installs them concurrently and yes I have had the probably where a package needed it's dependencies to install correctly (which shouldn't be the case, but sometimes you don't have control) but still, I think if you look at the dependency tree you can run a lot of leaves and branches in parallel without this being an issue because they don't meet up until the end. Still, pip with a requirement file with hashes is a lot faster.... |
It feels like you are comparing apples to oranges here. Pip doesn't resolve dependencies. Poetry install does. |
However those dependencies are only resolved during the lock phase, the install phase (that is performed more often than locks) just installs the packages. We could consider to do some time expensive ordering/parallelization calculations during the lock phase? |
Yeah, the dependency resolving taking long is fine, then doing a Like I said: that misses the ordering in installing which seems to be only needed if not doing it causes issues. Maybe the solution would be to give the install command a flag (and environment variable) to install concurrently to speed it up if ordering is not important. Then the default will be the slower and safe, but if you know it's not needed it will speed up CI build times significantly. |
Also I just saw this in the pip documentation, stating that they also install packages in order.. now I'm really confused about what poetry is doing to make it go so much slower... |
@hvdklauw Pip documentation just says that they are installing in topological order. That would still allow parallelism for packages that are not forced into a specific order by a dependency relationship. (But looking at the discussions around pip, they don't do parallel installs either.) |
I've dug into this a bit. And it seems that the slowness originates from the fact that As I understand @sdispater, this is done to ensure ordering of packages. However, Another reason is hash-checking, which is also supported in pip Hence, I don't see any benefit from running separate |
Another idea worth investigating is calling
This would avoid the repeated Python startup-time. There are some issues with this though:
This came out from more of an curious investigation into If those don't pan out, I think it's still absolutely acceptable to call |
We are currently using following script in CI to make installation faster (2x in one of our projects): poetry export -f requirements.txt --dev | poetry run -- pip install -r /dev/stdin |
I realize this is probably naive, but having had some experience with building my own tooling around Since poetry resolves dependencies upfront and then, at least for the From my own testing, changing this for loop in
This takes me from ~40s to freshly install (including new venv creation) about 30 dependencies to about 10s. For comparison, a serial, I don't know if the lack of parallelization here is because removing or updating things might be more sensitive to some kind of cross-dependency race, or if there are other reasons why a parallel per-dependency install isn't a good standard practice. But in my limited testing, that very simple change makes |
Ok, sorry, long post. I had the opportunity of discussing the issue with @sdispater recently, and we came to the conclusion that one of the ways to dramatically speed up the download time would be to download less bytes, and especially just download the ones we need. Let's add a few things:
I've done a POC for just this (not trying to integrate this into poetry, just seeing if I could achieve anything noteworthy by playing around). My first conclusions:
Given the file format of zip, at first, I thought it wouldn't be possible to improve that, but then I've looked again, and now I think if it's feasible to do much much better than that. More details on the analysis of the problems with the ZIP format. Technical details, not needed to understand the whole problem but if you're interested, here you are (click to expand)Ok, so first thing first: I'm not an expert of the zip format. Most of these are things I discovered today by reading google and wikipedia. If you think there's a mistake, you're most probably right, and please say so.The zip file is composed of
If you want to read a single file, here are the minimal steps one needs to take:
Sources: That being said, we know that the wheels are most likely created by the wheel tool, and we can make assumptions, like, plenty of them. This helps because:
This method would allow getting only 2 requests and 5kb (or we can make it 3 requests and probably 1kb), so for Django, we're talking about dividing the download time by about 10 000. That should speed up poetry. And the best part is that if there's no wheel or the wheel isn't in the expected format, then we can just detect it and fallback to today's strategy. I'm going to try and upgrade the POC to showcase reading the WHEEL file given a wheel URL. |
Another possible simpler solution would be to read the last, say, 3kb of the zip, look for Reading efficiently through the bytes can be done with the EDIT: yeah but... We have no idea how long the central directory may be between 1kb and 1MB, so by default it's hard. We're probably better off reading the central directory to know where to search |
Ok, POC updated at https://github.com/ewjoachim/quickread/blob/master/quickread/wheel.py (usage at https://github.com/ewjoachim/quickread/blob/master/script2.py). We can get the full requirements for a wheel with 2 requests, each downloading 0.5 to 2 kb. I'll try to see if there's a way to integrate this into poetry. |
Hm :/ I implemented that on a branch and the results are not as impressive as expected. Running on poetry itself, in the dev env: $ # On master
$ poetry run poetry update -vvv
...
1: Version solving took 3.156 seconds.
1: Tried 1 solutions.
0: Complete version solving took 52.984 seconds for 4 branches
0: Resolved for branches: (>=2.7,<2.8 || >=3.4,<3.5), (>=2.7,<2.8), (>=3.4,<3.5), (>=3.5,<4.0)
$ # On my branch
$ poetry run poetry update -vvv
...
1: Version solving took 2.874 seconds.
1: Tried 1 solutions.
0: Complete version solving took 45.825 seconds for 4 branches
0: Resolved for branches: (>=2.7,<2.8 || >=3.4,<3.5), (>=2.7,<2.8), (>=3.4,<3.5), (>=3.5,<4.0) Full output: https://gist.github.com/ewjoachim/6fed55fe84da9c90d6452b73ed64cdbd
Additional ideas for speeding up:
I'm posting my branch (#1803), but probably not spending a lot more time on that unless someone has a very smart idea :) |
@ewjoachim Definitely not being dismissive of the work - as I think anything that speeds of solving time is great (during development I definitely spend more time solving than installing) - just don't think it's relevant to this particular thread. -- I suspect the other posts calling out the subprocess pip call per module are the primary cause of the diff between pip and poetry on install . I think in some cases the pure parallelism approach would break - due mostly to modules doing custom things in the setup.py (which implicitly require dependencies to be installed). Something that parallelized each layer of depth n up the dependency graph, starting with the deepest would still probably always work; I haven't looked through the lockfile structure to see if the information is present there to do this. |
Hm, you're right... This stems from a discussion I had with @sdispater and I mistook this ticket with the one regarding the problem we discussed :( Sorry for the noise. I'll try to find the proper ticket or create one. |
From reading this thread, would someone be able to clarify why poetry needs to install each package individually? If you've already computed the dependency closure, what is the additional bookkeeping vs. generating multiple requirements.txt files and installing in parallel similar to pipenv? We've really enjoyed the usability boost of poetry commands, but in our CI system, builds can take up to 20 minutes to install even if everything is already downloaded (I assume b/c of subprocessing out to pip). I'm thinking of setting up our CI to do |
Yeah, again tested it yesterday in our CI environment, poetry install takes minutes longer then doing the export and then using pip. Both with and without the packages cached in a folder. |
Installing packages in parallel would be really nice. Pipenv does this in a really bad way: ignoring package dependencies completely. Sometimes installation fails and those packages are simply retried in the end. Not a good approach. But since we have the complete dependency graph we could find the packages that are safe to install in parallel and do that. |
This safe way is definitely ideal BUT the dumb way prob works in 90% of cases (and is significantly simpler). I wonder if you could get a tradeoff by topologically sorting, then doing parallel installs from deps up to top level requirements? (possibly weighting shared dependencies higher) |
This is a quick hack: using the depths for parallel installation: #2374. The idea is that it should be safe. EDIT: It is about four times faster on my computer, while still being safe in that it respects the dependencies of the packages. |
For anyone else stumbling on this thread, and since it hasn't been mentioned so far, it looks like #2595 has been merged and should help with this issue: #2595 (comment) |
Just curious. does the new installer pre-build for the local env, cache the built dists, and soft-link to them rather than copying them (like pip-accel did)? |
@finswimmer why is this closed ? I still see the problems on my side. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
If an exception occurs when executing a command, I executed it again in debug mode ((Does not apply)-vvv
option).python:3-7-stretch
) running on CircleCIIssue
I am reasonably sure that installing packages using
poetry install
instead ofpip install
takes significantly longer.To compare. Here is a
pip install
taking 35 seconds: https://circleci.com/gh/MichaelAquilina/S4/464#action-103Here is another build with
poetry install
and the same requirements taking 1:27 seconds: https://circleci.com/gh/MichaelAquilina/S4/521#action-104In both cases, both dev and non dev requirements were installed.
The text was updated successfully, but these errors were encountered: