Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark digest subsetting #14569

Closed
wants to merge 1 commit into from
Closed

Conversation

benjyw
Copy link
Contributor

@benjyw benjyw commented Feb 23, 2022

Probably not for merging.

[ci skip-rust]
[ci skip-build-wheels]

DigestSubset(
request.sources.source_files.snapshot.digest, PathGlobs(sorted(remaining_sources))
),
DigestSubset(request.sources.source_files.snapshot.digest, PathGlobs(remaining_sources)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PathGlobs ctor sorts for us, so no need to do it here ourselves

@stuhood
Copy link
Member

stuhood commented Mar 2, 2022

The results, for posterity:

  • generating the "full subset" by passing all paths in the digest in as the requested subset) takes ~1 second for 40k files and this time grows linearly with the number of files in the digest.
  • Creating a digest from file content in memory (via CreateDigest) takes ~8.3 seconds for 40k files, and this time grows roughly quadratically with the number of files in the digest.

So it is possible that full subset generation times of a few hundred ms add up across many consumers. But this benchmarking effort incidentally reveals something worse and not directly related to subsetting, i.e., superlinear behavior of CreateDigest.

https://user-images.githubusercontent.com/512764/155292114-d587c31c-af17-49e9-84a7-098c94a02e2e.png

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor

Stale?

@benjyw
Copy link
Contributor Author

benjyw commented Aug 18, 2022

I don't know that the underlying issue has been addressed?

@stuhood
Copy link
Member

stuhood commented Aug 18, 2022

I don't know that the underlying issue has been addressed?

Opening a ticket for the CreateDigest non-linearity is probably the next step rather than leaving this open.

@stuhood
Copy link
Member

stuhood commented Aug 18, 2022

Opened #16570.

@stuhood stuhood closed this Aug 18, 2022
stuhood pushed a commit that referenced this pull request Aug 24, 2022
Closes #16570

* Use a `DigestTrie` to create all snapshots at once, instead of creating them individually
* Store all in-memory file contents in a single (batched) call, instead of storing them individually

[ci skip-build-wheels]

---

I restored the benchmark script from #14569 to test this.

| size | create_digest_before | create_digest_after |
| --- | --- | --- |
| 20000 | 608 | 130 |
| 40000 | 1164 | 268 |
| 60000 | 2260 | 475 |
| 80000 | 3582 | 674 |
| 100000 | 5085 | 862 |
| 120000 | 6765 | 1057 |
| 140000 | 8818 | 1067 |
| 160000 | 10752 | 1361 |
| 180000 | 12619 | 1604 |
Eric-Arellano pushed a commit to Eric-Arellano/pants that referenced this pull request Aug 24, 2022
Closes pantsbuild#16570

* Use a `DigestTrie` to create all snapshots at once, instead of creating them individually
* Store all in-memory file contents in a single (batched) call, instead of storing them individually

[ci skip-build-wheels]

---

I restored the benchmark script from pantsbuild#14569 to test this.

| size | create_digest_before | create_digest_after |
| --- | --- | --- |
| 20000 | 608 | 130 |
| 40000 | 1164 | 268 |
| 60000 | 2260 | 475 |
| 80000 | 3582 | 674 |
| 100000 | 5085 | 862 |
| 120000 | 6765 | 1057 |
| 140000 | 8818 | 1067 |
| 160000 | 10752 | 1361 |
| 180000 | 12619 | 1604 |
Eric-Arellano added a commit that referenced this pull request Aug 29, 2022
Closes #16570

* Use a `DigestTrie` to create all snapshots at once, instead of creating them individually
* Store all in-memory file contents in a single (batched) call, instead of storing them individually

[ci skip-build-wheels]

---

I restored the benchmark script from #14569 to test this.

| size | create_digest_before | create_digest_after |
| --- | --- | --- |
| 20000 | 608 | 130 |
| 40000 | 1164 | 268 |
| 60000 | 2260 | 475 |
| 80000 | 3582 | 674 |
| 100000 | 5085 | 862 |
| 120000 | 6765 | 1057 |
| 140000 | 8818 | 1067 |
| 160000 | 10752 | 1361 |
| 180000 | 12619 | 1604 |
cczona pushed a commit to cczona/pants that referenced this pull request Sep 1, 2022
Closes pantsbuild#16570

* Use a `DigestTrie` to create all snapshots at once, instead of creating them individually
* Store all in-memory file contents in a single (batched) call, instead of storing them individually

[ci skip-build-wheels]

---

I restored the benchmark script from pantsbuild#14569 to test this.

| size | create_digest_before | create_digest_after |
| --- | --- | --- |
| 20000 | 608 | 130 |
| 40000 | 1164 | 268 |
| 60000 | 2260 | 475 |
| 80000 | 3582 | 674 |
| 100000 | 5085 | 862 |
| 120000 | 6765 | 1057 |
| 140000 | 8818 | 1067 |
| 160000 | 10752 | 1361 |
| 180000 | 12619 | 1604 |
@benjyw benjyw deleted the benchmark_digests branch May 24, 2023 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants