Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to allow remote execution for non-scrubbed actions, but I have some concerns with this implementation:
Can we get away with simply
scrubber.forSpawn(spawn) == null
here?Scrubber#forSpawn
is defined to return non-null iff at least one of the configuration rules matches the action; this is, strictly speaking, not a guarantee that scrubbing will occur (applying the respective transform might result in zero changes) but that ought to be the case with a carefully written configuration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, looks like I have non null
spawnScrubber
for every target with files input with configuration like:Looks like I can create rules filter by set label or mnemonic pattern. But collecting these rules will be a very difficult task:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the use case for stamping that I'm most familiar with (build practices at Google), there's a relatively small number of targets that perform stamping - typically only one per language/ruleset. Each of these targets creates a single language-specific library embedding the workspace status, which may then get linked into every binary (or other top-level deployable artifact) for that language. This way, only a few, easy to identify targets depend on the workspace status files.
It sounds like your use of stamping is much more widespread, and can affect an unbounded number of targets, as opposed to a relatively small number of relatively "well-known" ones. Could you provide some more information on how your setup uses stamping?
Also - are you looking for a solution specifically for
volatile-status.txt
, or does this have to generalize to any input file? For example, suppose that action A consumesvolatile-status.txt
and produces an arbitrary filea.out
(incorporating information fromvolatile-status.txt
). Then a later action B consumesa.out
. Would you then also need to scruba.out
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Only
volatile-status.txt
.We are trying to use stamping only for the top-level artifacts:
There are really few such goals.
The purpose of stamping is to save information in this file about which revision the file was compiled from. To do this, the revision number is transmitted via
volatile-status.txt
.The problem is that stamping is enabled not for concrete targets, but globally via the
--stamp
flag. In some cases, it can affect, for example, onbison
binary for generateyacc
files in one of the third-party libraries we use. If for this targetvolatile-status.txt
becomes significant, then we will get a cache miss for this and all dependent targets on every build. In this case, we get parasitic pressure on the cache, which is quite difficult to detect.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, before implementing scrubbing support with remote build, it makes sense to do the following:
scrubber.forSpawn(spawn) == null
cheap check for `mayBeExecutedRemotely' by default;--experimental_scrubbing-remote-build-expensive-check
(or create enum like--experimental_scrubbing-with-remote-build
with valuesdeny
,unmatched-local
,unchanged-local
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's go through the problems one by one:
Bazel 6 stamping vs remote cache/remote build
I think this problem is easily solvable. If we implement the solution I outlined above, it becomes possible to write a single rule of the form:
and it will apply to every action that has
volatile-status.txt
in its inputs, without having to list those actions explicitly in the configuration. (There's a little bit of redundancy betweenhas_volatile_input
andomitted_inputs
, but it's intentional: separating the matcher and the transform brings clarity and is a property I'd like to keep.)However, the configuration above will not apply to an action that produces an output derived from
volatile-status.txt
(that's what the next problem is about).Bazel 6 stamping is broken
It's unclear to me whether this is a bug in how stamping works, or an intended limitation. I don't mean to be dismissive; all I'm saying is that I didn't design the stamping feature and haven't used it all that much, so I can't authoritatively declare that there is a bug, and much less that cache key scrubbing is the correct way to solve it.
My suggestion here is to open a separate issue describing the limitations you see in stamping as it's currently implemented, so that people more knowledgeable about that part of Bazel can help brainstorm potential solutions. If we determine that scrubbing is the appropriate way to solve it, I'm happy to resume the search for a reasonable implementation; but I'd first like to be sure that there aren't simpler solutions.
Bazel 7 scrubbing configuration limitations
I completely agree with your assessment: scrubbing configurations are both hard to write and hard to debug.
The "hard to debug" is somewhat easier to solve: we need good tools to investigate the root cause of cache misses. We're currently trying to make some improvements in that direction in #18643, but there's still work to do.
The "hard to write" problem, as you correctly surmise, doesn't have a clear solution in sight. I share your sentiment that the better design entails having the rules themselves provide the scrubbing configuration to Bazel, so that users aren't forced to understand implementation details just so they can write the configuration. But we don't want rules to unilaterally decide that scrubbing should be used, either: scrubbing trades off correctness for performance, and Bazel should strive for correctness by default; some amount of user input would still be required in that world.
In conclusion
I don't want this discussion to drag on forever. I offer the following: if we agree that the first of these problems (scrubbing triggered by the presence of
volatile-status.txt
) is worth solving on its own, I volunteer to (re-)write the PR myself and get it submitted. I will tentatively say that there's still time to do this before 7.1.0, but it might slip to 7.2.0 since I have other things on my plate at the moment. At the same time, I encourage you to start a separate issue/discussion regarding stamping of non-toplevel targets. Does that sound good to you?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add the matcher
has_volatile_input
and allow remote assembly for all rules that do not use scrubbing, then this should completely cover the problem of using stamping from Bazel 6:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjgq, I have moved the changes about executing scrubbed actions locally to PR #21288 (check only by
scrubber.forSpawn(spawn) != null
without checking the affected input files).I tried adding `has_volatile_input' but failed: this requires strong refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will import #21288 today.
The
has_volatile_input
implementation is still on me, as promised above; I'm hoping to get to it next week.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some time this week to chat with other folks on the team about stamping, and to think about how this feature might be implemented. Unfortunately, the conclusion that I came to is that I don't know how to implement this in a satisfactory way, and I really don't want to rush adding a feature we might later regret. So I'm going to put this on hold at least until 7.2.0. Sorry.
The internal discussions I had also reinforced a point that I made earlier in this thread: if you have so many targets consuming
volatile-info.txt
that it's difficult to explicitly enumerate all of them in the scrubbing config, you're likely doing something wrong (at least according to the way we originally designed stamping to be used). I'd expect you to have at most one target per language that consumesvolatile-info.txt
and produces a wrapper library around its contents suitable for that language; your top-level binary targets should access the volatile information through that library, and shouldn't themselves depend onvolatile-info.txt
directly. Thus, you only have to configure scrubbing for the small number of targets that generate the per-language wrapper libraries, which I don't think is an unreasonable thing to ask users to do.I still haven't read a convincing explanation for why your build graph can't be organized in this way.