-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build_zipmanifest results should be memoized #154
Comments
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Not a bad idea. Should be easy enough. It looks like code that I had worked on due to a issue with pypy. So just adding some additional information for anyone that might beat me to the punch. Originally distribute used zipimports._zip_directory_cache Which cached this, except the private interface is inconsistent between pypy/cpython/win32. https://bitbucket.org/tarek/distribute/issue/27 |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Here is a file you can test, if interested, I believe it should be caching now: https://bitbucket.org/philip_thiem/setuptools/downloads#download-348570 Alternatively, I'd like a good test file if possible. :) |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Pending Pull Request #58 |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): This work has been merged, but I have one concern about performance. We're making a speed vs. memory tradeoff here. This change will necessarily increase the memory usage of setuptools, potentially substantially if a lot of zip packages are used, and that memory won't be reclaimed for the life of the process. I'm now starting to question whether this functionality should be enabled by default. As a result, I've bookmarked the proposed commits with a 'issue154' bookmark, leaving 'master' in a separate ancestry. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): The more I think about it, the speed vs. memory tradeoff is a hard one to make at a library like this, but I think the default position should be to favor memory usage to speed. What if instead each process was able to opt-in to enable caching? |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Looks like the response that I sent didn't get posted. I think opt in is fine. I can update my branch there to make the cache optional. I could certainly pull that information from a command line or env, I will have to refresh my memory where configuration is stored in memory. |
Original comment by orsenthil (Bitbucket: orsenthil, GitHub: orsenthil): I could not find this in master/default/tip. (Plus no reference to this change in NEWS file). Will this change be available part of setuptools release?
|
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): I just ran an experiment on my Python 3.4 Windows 64-bit environment where I have 54 zip eggs and found that enabling this feature adds 1.4Mb (+6.6%) to the heap usage versus the status quo, but I will see almost no speed benefit from the feature as my environment has an approximately one-to-one mapping of module to zip egg. That's a pretty substantial cost to impose on nearly all processes, especially when the benefits will only be had is specially-crafted scenarios, such as the OP presents. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): This feature has been released in Setuptools 5.4. Please try it out and let me know what you think. As you can see above, it's an opt-in only feature, at least for now. A pretty compelling argument would be needed to justify enabling this behavior by default. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):
I may have been mistaken here. Analyzing the code, I would have expected 5.3 to also be storing the zip manifests for every zip file (in the zipinfo property). I suspect when I said "status quo" here, I was simply disabling the caching, not actually measuring against 5.3 (the actual status quo). Since I didn't actually log my experiment, I can't say for sure at this point, but I'm going to assume that was the case. |
Originally reported by: wickman (Bitbucket: wickman, GitHub: wickman)
as far as I can tell, build_zipmanifest is not cached.
from a recent profile:
This is the profile for starting up a PEX file (zipped python environment, see https://mail.python.org/pipermail/distutils-sig/2014-January/023727.html ) with a number of exploded eggs inside. build_zipmanifest is called with the same archive every time we construct a Distribution via EggMetadata:
It's not an unreasonable assumption that each time you construct a new Distribution, it will either be on disk or part of its own zip archive, meaning these would not be duplicated calls.
In our case, all eggs are in a single zip. This means 57 50ms calls instead of 1 50ms call in order to run this Python application which has 57 egg dependencies.
The proposal is to cache calls to zipmanifest (perhaps invalidating should os.stat/mtime change.)
The text was updated successfully, but these errors were encountered: