Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache_computed_file_digests is a useful performance option that should be documented #14401

Open
c0d3h4x0r opened this issue Dec 9, 2021 · 12 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Documentation Documentation improvements that cannot be directly linked to other team labels team-Performance Issues for Performance teams type: documentation (cleanup)

Comments

@c0d3h4x0r
Copy link

Description of the problem / feature request:

cache_computed_file_digests is a useful performance option that should be documented.

Feature requests: what underlying problem are you trying to solve with this feature?

This option is useful for improving incremental build speed on large codebases.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Immediately following a full build, do a no-op incremental build on our codebase containing ~400,000 input files. It takes about 15 minutes.

Immediately following a full build, do a no-op incremental build with --cache_computed_file_digests=400000. It takes about 15 seconds.

What operating system are you running Bazel on?

Windows 10

What's the output of bazel info release?

Starting local Bazel server and connecting to it...
release 4.2.1

If bazel info release returns "development version" or "(@Non-Git)", tell us how you built Bazel.

n/a

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

fatal: not a git repository (or any of the parent directories): .git

Our codebase is stored in a company-internal Perforce server, not in Git.

Have you found anything relevant by searching the web?

No.

Any other information, logs, or outputs that you want to share?

n/a

@meisterT
Copy link
Member

@janakdr since you reviewed the change that introduced the flag: can you comment on potential correctness concerns of the flag?

@janakdr
Copy link
Contributor

janakdr commented Dec 20, 2021

I think the correctness is pretty good, since it uses inodes. However, does Windows have inodes? Also, I'm confused by this report, since the flag is not intended to help the case of a "no-op incremental build": in that case, Bazel should have all metadata in memory. The lower latency is supposed to happen after a flag flip, when we discard our in-memory actions, and it's supposed to apply to the output tree: source nodes don't get cleared after a flag flip. So I think a more detailed description of the Bazel commands run would be helpful (not targets, just flags).

@meisterT
Copy link
Member

Bazel should have all metadata in memory

That's what I thought at first as well but then remembered that windows doesn't have watchfs (see #1931) and assumed that this is the reason why we don't trust the metadata. Is that assumption correct?

@janakdr
Copy link
Contributor

janakdr commented Dec 20, 2021 via email

@meisterT meisterT added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Jan 20, 2022
@guw
Copy link
Contributor

guw commented Apr 6, 2022

Just learned here about --watchfs. It's disabled by default. I think there is also a problem on Mac. I see lots of scanning just for simple "no-op" commands. It doesn't look like anything is cached by default.

This might be what I am seeing as well:
https://groups.google.com/g/bazel-discuss/c/E_0F1t4FITE

@c0d3h4x0r
Copy link
Author

c0d3h4x0r commented Apr 22, 2022

@guw fyi:

The Windows filesystem watcher is still experimental as of Bazel 5.1.1 (the version we're now on). We encountered other problems when we tried to enable it, so it is not a viable substitute for using this undocumented performance option.

@c0d3h4x0r
Copy link
Author

So what's the right path forward here? Right now, our business is taking a necessary dependency on this undocumented option, which isn't a position we want to remain in.

@meisterT
Copy link
Member

I think it would be helpful to know the exact sequence of Bazel commands (feel free to leave out targets) and also what modifications you did in between. And perhaps the Bazel output on the second command.

@c0d3h4x0r
Copy link
Author

c0d3h4x0r commented Apr 25, 2022 via email

@ShreeM01 ShreeM01 added the team-Documentation Documentation improvements that cannot be directly linked to other team labels label Jan 10, 2023
@brentleyjones
Copy link
Contributor

Before this gets flagged as stale, this is still a valid issue. This flag should be documented, and potentially the default increased as well.

@meisterT
Copy link
Member

cc @coeuvre

@coeuvre coeuvre added the not stale Issues or PRs that are inactive but not considered stale label Feb 16, 2023
copybara-service bot pushed a commit that referenced this issue Feb 13, 2024
It seems pretty important to document the existence of this flag, as it has implications for both performance and correctness.

See #14401 for prior discussion.

PiperOrigin-RevId: 606545000
Change-Id: I3aa9d311c805ebf9fb7401b79f320e1ba11d2179
tjgq added a commit to tjgq/bazel that referenced this issue Feb 13, 2024
It seems pretty important to document the existence of this flag, as it has implications for both performance and correctness.

See bazelbuild#14401 for prior discussion.

PiperOrigin-RevId: 606545000
Change-Id: I3aa9d311c805ebf9fb7401b79f320e1ba11d2179
@tjgq
Copy link
Contributor

tjgq commented Feb 13, 2024

Doing some back-of-the-napkin math, a digest cache with 50k entries (current default) should consume around 11MB, assuming compressed oops, average pathname length of 100 bytes, and hash length of 32 bytes. So increasing the default is unlikely to cause problems. @meisterT, WDYT about making it, say, 100k?

It would also be nice to provide some sort of indication that the cache size is too small to be effective (e.g., if can't fit the inputs+outputs of the largest action in a build, or perhaps even an entire build's worth of them).

github-merge-queue bot pushed a commit that referenced this issue Feb 13, 2024
It seems pretty important to document the existence of this flag, as it
has implications for both performance and correctness.

See #14401 for prior
discussion.

PiperOrigin-RevId: 606545000
Change-Id: I3aa9d311c805ebf9fb7401b79f320e1ba11d2179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Documentation Documentation improvements that cannot be directly linked to other team labels team-Performance Issues for Performance teams type: documentation (cleanup)
Projects
None yet
Development

No branches or pull requests

9 participants