-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce hash_utils.hash_dir
.
#8327
Conversation
Implement a simple stable hash for the recursive contents of a directory for use in v1 Tasks. In v2 we have the `fs/store` crate and intrinsics that expose its types to @rules; this just tides v1 Tasks over for cases where we just need a hash and not a Snapshot we'll never materialize (via `self.context._scheduler.<adhoc sync apis>`). Performance is ~identical to the `self.context._scheduler.capture_snapshots` API (a few percent faster on average). This change supports pantsbuild#8263.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you for the type hints and pathlib.Path usage!
It's not clear to me that "a few percent faster" is worth the extra API (which won't support ignore patterns, etc). But shrug. |
I didn't set out for that reason - the existing "API" was unkown to me. Its also "private" with self.context._scheduler. After finding it after I went to use it, found out it was slower - which was a bit shocking, and then I tried to fully use the api instead of bastardize it to get a fingeprint as a side effect never using the snapshot. Materializing from the snapshot to ensure I had what I fingerprinted, found a 4 minute materialize on a django example project. At that point I returned to what I had not wanting to sink more time into the project. |
Hmm. That's pretty unexpected. While capturing snapshots has some known inefficiencies, materialization should be snappy. |
All the examples I found snapshotted 1-handful of files. This was snapshotting a loose pex like we build in pants for tools which had 5970 files. I suspected this may have had something to do with it but did not dig. |
Implement a simple stable hash for the recursive contents of a directory
for use in v1 Tasks. In v2 we have the
fs/store
crate and intrinsicsthat expose its types to @rules; this just tides v1 Tasks over for cases
where we just need a hash and not a Snapshot we'll never materialize
(via
self.context._scheduler.<adhoc sync apis>
). Performance is~identical to the
self.context._scheduler.capture_snapshots
API (afew percent faster on average).
This change supports #8263.