Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce hash_utils.hash_dir. #8327

Merged
merged 2 commits into from
Sep 25, 2019
Merged

Conversation

jsirois
Copy link
Contributor

@jsirois jsirois commented Sep 24, 2019

Implement a simple stable hash for the recursive contents of a directory
for use in v1 Tasks. In v2 we have the fs/store crate and intrinsics
that expose its types to @rules; this just tides v1 Tasks over for cases
where we just need a hash and not a Snapshot we'll never materialize
(via self.context._scheduler.<adhoc sync apis>). Performance is
~identical to the self.context._scheduler.capture_snapshots API (a
few percent faster on average).

This change supports #8263.

Implement a simple stable hash for the recursive contents of a directory
for use in v1 Tasks. In v2 we have the `fs/store` crate and intrinsics
that expose its types to @rules; this just tides v1 Tasks over for cases
where we just need a hash and not a Snapshot we'll never materialize
(via `self.context._scheduler.<adhoc sync apis>`). Performance is
~identical to the `self.context._scheduler.capture_snapshots` API (a
few percent faster on average).

This change supports pantsbuild#8263.
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you for the type hints and pathlib.Path usage!

src/python/pants/base/hash_utils.py Show resolved Hide resolved
src/python/pants/base/hash_utils.py Show resolved Hide resolved
tests/python/pants_test/base/test_hash_utils.py Outdated Show resolved Hide resolved
@jsirois jsirois merged commit be19a17 into pantsbuild:master Sep 25, 2019
@jsirois jsirois deleted the issues/8263/pre-work branch September 25, 2019 06:00
@stuhood
Copy link
Member

stuhood commented Sep 26, 2019

It's not clear to me that "a few percent faster" is worth the extra API (which won't support ignore patterns, etc). But shrug.

@jsirois
Copy link
Contributor Author

jsirois commented Sep 26, 2019

It's not clear to me that "a few percent faster" is worth the extra API

I didn't set out for that reason - the existing "API" was unkown to me. Its also "private" with self.context._scheduler. After finding it after I went to use it, found out it was slower - which was a bit shocking, and then I tried to fully use the api instead of bastardize it to get a fingeprint as a side effect never using the snapshot. Materializing from the snapshot to ensure I had what I fingerprinted, found a 4 minute materialize on a django example project. At that point I returned to what I had not wanting to sink more time into the project.

@stuhood
Copy link
Member

stuhood commented Sep 26, 2019

Materializing from the snapshot to ensure I had what I fingerprinted, found a 4 minute materialize on a django example project.

Hmm. That's pretty unexpected. While capturing snapshots has some known inefficiencies, materialization should be snappy.

@jsirois
Copy link
Contributor Author

jsirois commented Sep 26, 2019

All the examples I found snapshotted 1-handful of files. This was snapshotting a loose pex like we build in pants for tools which had 5970 files. I suspected this may have had something to do with it but did not dig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants