-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower noop wheel install overhead. #2315
Lower noop wheel install overhead. #2315
Conversation
Previously, installing a wheel that was already installed incurred the cost of hashing the installed wheel chroot every time. The overhead of this wasted work for a warm cache was egregious for large distributions like PyTorch, with gigabytes of files to hash taking seconds. Work towards pex-tool#2312.
This drops one of the two ~3s timings in #2312 ("Installing 22 distributions ...") to ~13ms for a warm cache:
|
# We support legacy chroots below by calculating the chroot fingerprint just in time. | ||
pass | ||
else: | ||
cached_fingerprint = installed_wheel.fingerprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmanji so this represents the avoidance of ~3s to hash PyTorch 1 of 2 times. It only works for new caches though (rm -rf ~/.pex/installed_wheels
). I wanted to keep the status quo here with the chroot symlinking and just get this easy win for now. I do have some notes drawn up revisiting using a separate chroot hash, so this may all go away later, but with the same net effect of skipping the expensive chroot hashing.
The new installed wheel chroot `.layout.json` `fingerprint` field was missing.
@@ -732,7 +732,7 @@ def iter_map_parallel( | |||
# least two slots to ensure we process input items in parallel. | |||
pool_size = max(2, min(len(input_items) // min_average_load, _sanitize_max_jobs(max_jobs))) | |||
|
|||
perform_install = functools.partial(_apply_function, function) | |||
apply_function = functools.partial(_apply_function, function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was, in the end, an unrelated re-name. I was poking around in the resolve / wheel install machinery looking at timings and narrowing in on the double-hashing going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice win!
I note that that there's often other hashes related to wheels flying around, e.g. in a lockfile, or as values of the distributions
map in PEX-INFO
. Is there any scope for using those directly?
except LoadError: | ||
# We support legacy chroots below by calculating the chroot fingerprint just in time. | ||
pass | ||
else: | ||
cached_fingerprint = installed_wheel.fingerprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally stylistic (no need to change), but is it possible to calculate wheel_dir_hash
in these two branches directly, potentially saving some slight fiddle:
except LoadError: | |
# We support legacy chroots below by calculating the chroot fingerprint just in time. | |
pass | |
else: | |
cached_fingerprint = installed_wheel.fingerprint | |
except LoadError: | |
# We support legacy chroots below by calculating the chroot fingerprint just in time. | |
wheel_dir_hash = fingerprint_path(self.install_chroot) | |
else: | |
wheel_dir_hash = installed_wheel.fingerprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't work out nice like you think since installed_wheel.fingerprint
can be None, necessitating repeating the except clause calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course; thanks.
Yes. That's the thrust of the last sentence of my comment to @zmanji here: #2315 (comment) |
Previously, installing a wheel that was already installed incurred the
cost of hashing the installed wheel chroot every time. The overhead of
this wasted work for a warm cache was egregious for large distributions
like PyTorch, with gigabytes of files to hash taking seconds.
Work towards #2312.