-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collecting from cache is surprisingly slow #10467
Comments
That sounds about right to me TBH. If you do |
I was hoping that
I'm wondering if there are some low hanging fruits to reduce that overhead? Perhaps my feature request should have been rather: Introduce optimized/fast mode for fully pip-compiled requirement lists. Does that make more sense? |
@bluenote10 thanks for the clear reproduction instructions, makes it so much easier to look at something like this. I took a profile on my machine and found that it was spending 25s on 92 calls of On the one hand this is a good opportunity to thread the http calls, because they're all independent. But... right now criteria resolution happens on a sort of incremental basis and that will need to be moved a little. I applied approximately this diff and tried again. Total time for --- a/src/pip/_vendor/resolvelib/resolvers.py
+++ b/src/pip/_vendor/resolvelib/resolvers.py
@@ -1,5 +1,6 @@
import collections
import operator
+import multiprocessing.dummy
from .providers import AbstractResolver
from .structs import DirectedGraph, IteratorMapping, build_iter_view
@@ -336,11 +337,16 @@ class Resolution(object):
# Initialize the root state.
self._states = [State(mapping=collections.OrderedDict(), criteria={})]
- for r in requirements:
+
+ def inner(r):
try:
self._add_to_criteria(self.state.criteria, r, parent=None)
except RequirementsConflicted as e:
raise ResolutionImpossible(e.criterion.information)
+ return r
+ with multiprocessing.dummy.Pool() as tp:
+ for r_out in tp.imap_unordered(inner, requirements):
+ pass
# The root state is saved as a sentinel so the first ever pin can have |
So in the process of taking a profile I think I have a bit of an idea of where that time is going (and it happens that timings on my machine are close enough to yours to be comparable, ~90s in total on main, ~5-6s to copy files...). Of that 90s, ~40s were spent fetching pages from pypi, that can be parallelized which decreases the wall clock time, but it still takes ~10s. ~50s are spent on "install_wheel", of which only ~5-6s is spent in shutil.copyfileobj. A huge fraction of the remainder under install_wheel is spent in py_compile. |
Please share the requirements.txt that you're experimenting with to reproduce this. |
Should be in the first comment in this issue. |
Great findings, thanks a lot for looking into it!
That sounds indeed like a low hanging fruit, with a very welcome speedup 👍 I'm wondering if it would be feasible to store the http request results themselves in the cache as well for an extra speed-up. For
Does that refer to the compilation of If I do the math correctly, the parallel requests combined with skipping |
using Case: pip install Plone On Python 3.10: first time 1m26, second time 0m12 @mauritsvanrees found that:
|
I haven't profiled what you describe yet, but it sounds like a separate issue with a similar symptom. If others agree I'd propose re-naming this issue to be about parallelizing network requests and we could make a separate issue for what sounds like metadata parsing related? |
I agree. |
What's the problem this feature will solve?
Workflows that require a frequent re-creation of virtual envs from scratch are very slow, and a large amount of time is spend collecting wheels despite having all the artefacts in the local cache already.
To demonstrate, let's consider installing from this (pip-compiled)
requirements.txt
:requirements.txt
Benchmark script for reproduction:
Starting with the second execution of the script, pip can fully rely on its local cache. But even with 100% cache hits, the time it takes to run all the
Collecting ... Using cached ...
operations takes ~42 seconds. The total run time is ~92 seconds, so 45% of the time is spend just collecting from the cache. This time seems excessive. The disk is a fast SSD and the total sum of data to be collected should be < 1GB. So in terms of I/O it should be possible to collect the artefacts much faster from an SSD based cache.In terms of raw I/O loading the wheels from an SSD based cache should be in the order of few seconds. Thus, bringing down the collection time could speed up venv creation in many cases by almost a factor of 2. This could e.g. significantly speed up CI pipelines that require creation of multiple similar venvs (in fact, venv creation is becoming an increasing bottleneck in complex CI pipelines for us).
Used pip version: 21.2.4
Describe the solution you'd like
Perhaps it is possible to revisit why collecting artefacts from the cache is so slow.
Alternative Solutions
Alternative solutions probably doesn't apply in case of performance improvements.
Additional context
All information given above?
Code of Conduct
The text was updated successfully, but these errors were encountered: