Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,8 @@ public CachePolicy getWriteCachePolicy(Spawn spawn) {
public boolean mayBeExecutedRemotely(Spawn spawn) {
return remoteCache instanceof RemoteExecutionCache
&& remoteExecutor != null
&& Spawns.mayBeExecutedRemotely(spawn);
&& Spawns.mayBeExecutedRemotely(spawn)
&& !hasScrubbedInput(spawn, scrubber);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to allow remote execution for non-scrubbed actions, but I have some concerns with this implementation:

  1. In addition to inputs, we might also scrub command line arguments and set a scrubbing salt, so this check is incomplete.
  2. Expanding/iterating the inputs is a fairly expensive operation, and with this change we're going to do it twice for every matching action (first time here, second time when computing the actual cache key). I'd prefer to not make the decision to run remotely conditional on the input expansion.

Can we get away with simply scrubber.forSpawn(spawn) == null here? Scrubber#forSpawn is defined to return non-null iff at least one of the configuration rules matches the action; this is, strictly speaking, not a guarantee that scrubbing will occur (applying the respective transform might result in zero changes) but that ought to be the case with a carefully written configuration.

Copy link
Contributor Author

@bozaro bozaro Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get away with simply scrubber.forSpawn(spawn) == null here?

No, looks like I have non null spawnScrubber for every target with files input with configuration like:

rules {
  transform {
    omitted_inputs: "^bazel-out/volatile-status\\.txt$"
  }
}

Looks like I can create rules filter by set label or mnemonic pattern. But collecting these rules will be a very difficult task:

  • most stamping actions are declared in third-party code;
  • in case of an error, I get a quiet cache miss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the use case for stamping that I'm most familiar with (build practices at Google), there's a relatively small number of targets that perform stamping - typically only one per language/ruleset. Each of these targets creates a single language-specific library embedding the workspace status, which may then get linked into every binary (or other top-level deployable artifact) for that language. This way, only a few, easy to identify targets depend on the workspace status files.

It sounds like your use of stamping is much more widespread, and can affect an unbounded number of targets, as opposed to a relatively small number of relatively "well-known" ones. Could you provide some more information on how your setup uses stamping?

Also - are you looking for a solution specifically for volatile-status.txt, or does this have to generalize to any input file? For example, suppose that action A consumes volatile-status.txt and produces an arbitrary file a.out (incorporating information from volatile-status.txt). Then a later action B consumes a.out. Would you then also need to scrub a.out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then a later action B consumes a.out. Would you then also need to scrub a.out?

Nope. Only volatile-status.txt.

We are trying to use stamping only for the top-level artifacts:

  • for executable files;
  • for docker images.

There are really few such goals.

The purpose of stamping is to save information in this file about which revision the file was compiled from. To do this, the revision number is transmitted via volatile-status.txt.

The problem is that stamping is enabled not for concrete targets, but globally via the --stamp flag. In some cases, it can affect, for example, on bison binary for generate yacc files in one of the third-party libraries we use. If for this target volatile-status.txt becomes significant, then we will get a cache miss for this and all dependent targets on every build. In this case, we get parasitic pressure on the cache, which is quite difficult to detect.

Copy link
Contributor Author

@bozaro bozaro Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, before implementing scrubbing support with remote build, it makes sense to do the following:

  • use scrubber.forSpawn(spawn) == null cheap check for `mayBeExecutedRemotely' by default;
  • use an expensive check with iteration over files enabling by the experimental flag like --experimental_scrubbing-remote-build-expensive-check (or create enum like --experimental_scrubbing-with-remote-build with values deny, unmatched-local, unchanged-local).

Copy link
Contributor

@tjgq tjgq Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's go through the problems one by one:

Bazel 6 stamping vs remote cache/remote build

I think this problem is easily solvable. If we implement the solution I outlined above, it becomes possible to write a single rule of the form:

rule {
  matcher {
    has_volatile_input: true
  }
  transform {
    omitted_inputs: "^bazel-out/volatile-status\\.txt$"
  }
}

and it will apply to every action that has volatile-status.txt in its inputs, without having to list those actions explicitly in the configuration. (There's a little bit of redundancy between has_volatile_input and omitted_inputs, but it's intentional: separating the matcher and the transform brings clarity and is a property I'd like to keep.)

However, the configuration above will not apply to an action that produces an output derived from volatile-status.txt (that's what the next problem is about).

Bazel 6 stamping is broken

It's unclear to me whether this is a bug in how stamping works, or an intended limitation. I don't mean to be dismissive; all I'm saying is that I didn't design the stamping feature and haven't used it all that much, so I can't authoritatively declare that there is a bug, and much less that cache key scrubbing is the correct way to solve it.

My suggestion here is to open a separate issue describing the limitations you see in stamping as it's currently implemented, so that people more knowledgeable about that part of Bazel can help brainstorm potential solutions. If we determine that scrubbing is the appropriate way to solve it, I'm happy to resume the search for a reasonable implementation; but I'd first like to be sure that there aren't simpler solutions.

Bazel 7 scrubbing configuration limitations

I completely agree with your assessment: scrubbing configurations are both hard to write and hard to debug.

The "hard to debug" is somewhat easier to solve: we need good tools to investigate the root cause of cache misses. We're currently trying to make some improvements in that direction in #18643, but there's still work to do.

The "hard to write" problem, as you correctly surmise, doesn't have a clear solution in sight. I share your sentiment that the better design entails having the rules themselves provide the scrubbing configuration to Bazel, so that users aren't forced to understand implementation details just so they can write the configuration. But we don't want rules to unilaterally decide that scrubbing should be used, either: scrubbing trades off correctness for performance, and Bazel should strive for correctness by default; some amount of user input would still be required in that world.

In conclusion

I don't want this discussion to drag on forever. I offer the following: if we agree that the first of these problems (scrubbing triggered by the presence of volatile-status.txt) is worth solving on its own, I volunteer to (re-)write the PR myself and get it submitted. I will tentatively say that there's still time to do this before 7.1.0, but it might slip to 7.2.0 since I have other things on my plate at the moment. At the same time, I encourage you to start a separate issue/discussion regarding stamping of non-toplevel targets. Does that sound good to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the matcher has_volatile_input and allow remote assembly for all rules that do not use scrubbing, then this should completely cover the problem of using stamping from Bazel 6:

  • there will be one rule that allows local assembly of stamping action;
  • this rule will behave the same as the local Bazel cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjgq, I have moved the changes about executing scrubbed actions locally to PR #21288 (check only by scrubber.forSpawn(spawn) != null without checking the affected input files).

I tried adding `has_volatile_input' but failed: this requires strong refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will import #21288 today.

The has_volatile_input implementation is still on me, as promised above; I'm hoping to get to it next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some time this week to chat with other folks on the team about stamping, and to think about how this feature might be implemented. Unfortunately, the conclusion that I came to is that I don't know how to implement this in a satisfactory way, and I really don't want to rush adding a feature we might later regret. So I'm going to put this on hold at least until 7.2.0. Sorry.

The internal discussions I had also reinforced a point that I made earlier in this thread: if you have so many targets consuming volatile-info.txt that it's difficult to explicitly enumerate all of them in the scrubbing config, you're likely doing something wrong (at least according to the way we originally designed stamping to be used). I'd expect you to have at most one target per language that consumes volatile-info.txt and produces a wrapper library around its contents suitable for that language; your top-level binary targets should access the volatile information through that library, and shouldn't themselves depend on volatile-info.txt directly. Thus, you only have to configure scrubbing for the small number of targets that generate the per-language wrapper libraries, which I don't think is an unreasonable thing to ask users to do.

I still haven't read a convincing explanation for why your build graph can't be organized in this way.

}

@VisibleForTesting
Expand Down Expand Up @@ -1595,6 +1596,31 @@ void report(Event evt) {
}
}

private static boolean hasScrubbedInput(Spawn spawn, @Nullable Scrubber scrubber) {
if (scrubber == null) {
return false;
}
SpawnScrubber spawnScrubber = scrubber.forSpawn(spawn);
if (spawnScrubber == null) {
return false;
}
if (!spawnScrubber.getSalt().isEmpty()) {
return true;
}
for (String arg : spawn.getArguments()) {
if (!arg.equals(spawnScrubber.transformArgument(arg))) {
return true;
}
}
var inputFiles = spawn.getInputFiles();
for (ActionInput inputFile : inputFiles.toList()) {
if (spawnScrubber.shouldOmitInput(inputFile)) {
return true;
}
}
return false;
}

/**
* A simple value class combining a hash of the tool inputs (and their digests) as well as a set
* of the relative paths of all tool inputs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -313,10 +313,7 @@ public void beforeCommand(CommandEnvironment env) throws AbruptExitException {

boolean enableScrubbing = remoteOptions.scrubber != null;
if (enableScrubbing && enableRemoteExecution) {

throw createOptionsExitException(
"Cannot combine remote cache key scrubbing with remote execution",
FailureDetails.RemoteOptions.Code.EXECUTION_WITH_SCRUBBING);
env.getReporter().handle(Event.warn("Cannot combine remote cache key scrubbing with remote execution. All actions with cache key scrubbing will be executed locally"));
}

// TODO(bazel-team): Consider adding a warning or more validation if the remoteDownloadRegex is
Expand Down