`bazel run` when using `--experimental_remote_spawn_cache` is broken #4041

tybook · 2017-11-07T20:27:04Z

Description of the problem / feature request / question:

bazel run, of at least sh_binary and java_binary targets, does not work when the target is fetched from the remote cache. The runfiles directory for the target is not recreated. I get output like:

$ bazel run //:script
INFO: Analysed target //:script (0 packages loaded).
INFO: Found 1 target...
Target //:script up-to-date:
  bazel-bin/script
INFO: Elapsed time: 0.262s, Critical Path: 0.01s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/script
ERROR: Error running program: java.io.IOException: Cannot run program "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/_bin/process-wrapper" (in directory "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/bazel-out/k8-fastbuild/bin/script.runfiles/bazel_run_bug"): error=2, No such file or directory

This problem does not seem to exist for executable genrules. This problem occurs both with and without using the --script_path flag. We use bazel run commands for significant portions of our deployment process, so this is blocking our adoption of remote caching.

If possible, provide a minimal example to reproduce the problem:

==> WORKSPACE <==
workspace(name='bazel_run_bug')

==> .bazelrc <==
# Remote caching
startup --host_jvm_args=-Dbazel.DigestFunction=SHA1
build --experimental_remote_spawn_cache
build --remote_rest_cache=http://REMOTE.CACHE.URL
build --experimental_strict_action_env

==> BUILD <==
sh_binary(
    name="script",
    srcs=["script.sh"],
)

==> script.sh <==
#!/bin/bash
echo "Success"

Start with cold local and remote caches.
Run bazel run //:script and notice that it succeeds, i.e. its output is:

$ bazel run //:script
INFO: Analysed target //:script (12 packages loaded).
INFO: Found 1 target...
Target //:script up-to-date:
  bazel-bin/script
INFO: Elapsed time: 0.934s, Critical Path: 0.11s
INFO: Build completed successfully, 4 total actions

INFO: Running command line: bazel-bin/script
Success

Clean the local cache with bazel clean, but leave the remote cache warmed.
Run bazel run //:script again and notice that it fails, i.e. its output is:

$ bazel run //:script
INFO: Analysed target //:script (0 packages loaded).
INFO: Found 1 target...
Target //:script up-to-date:
  bazel-bin/script
INFO: Elapsed time: 0.262s, Critical Path: 0.01s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/script
ERROR: Error running program: java.io.IOException: Cannot run program "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/_bin/process-wrapper" (in directory "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/bazel-out/k8-fastbuild/bin/script.runfiles/bazel_run_bug"): error=2, No such file or directory

Environment info

Operating System:
Ubuntu 16.04
Bazel version (output of bazel info release):
Release 0.7.0 and built from HEAD (commit 67c84b1036ad02ba2384fa75fb28e779a488f3d4 on Nov 6, 2017)

Have you found anything relevant by searching the web?

#3934 looked maybe related, but this problem is still occurring with Bazel built from HEAD, so apparently it isn't. It might be the same problem as this: https://groups.google.com/forum/#!topic/bazel-discuss/O-pockbTV8M

Anything else, information or logs or outputs that would be helpful?

(If they are large, please upload as attachment or provide link).

The text was updated successfully, but these errors were encountered:

tybook · 2017-11-07T20:27:33Z

@hhclam We talked about this briefly at Bazel conf

damienmg · 2017-11-08T19:29:45Z

/cc @ulfjack @buchgr

hhclam · 2017-11-14T00:33:09Z

Saw this problem too. Will look.

hhclam · 2017-11-14T02:10:27Z

I verified that master still has this issue.

Also I verified that --spawn_strategy=remote --remote_rest_cache works with both 0.7.0 and master. This seems to be a problem with --experimental_remote_spawn_cache only.

ulfjack · 2017-11-28T11:49:13Z

I have a fix.

- remove BaseSpawn.Local; instead, all callers pass in the full set of execution requirements they want to set - disable caching and sandboxing for the symlink tree action - it does not declare outputs, so it can't be cached or sandboxed (fixes bazelbuild#4041) - centralize the existing execution requirements in the ExecutionRequirements class - centralize checking for execution requirements in the Spawn class (it's possible that we may need a more decentralized, extensible design in the future, but for now having them in a single place is simple and effective) - update the documentation - forward the relevant tags to execution requirements in TargetUtils (progress on bazelbuild#3960) - this also contributes to bazelbuild#4153 PiperOrigin-RevId: 177288598

robfig · 2017-11-29T20:17:29Z

We ran into this when rolling out remote cache, and it's causing some havoc. Any chance of a 0.8.1 release in the near term including this fix, or should I build my own to hold us over to 0.9.0?

hhclam · 2017-11-29T20:24:37Z

If you feel brave you can cherry pick the following changes..
7967f33
72cbef7
4d7f8f7

Instead use SimpleSpawn. Also set the execution requirements properly - in particular, we need to disable caching and sandboxing for these spawns. Fixes bazelbuild#4041. PiperOrigin-RevId: 177132445 Change-Id: Iaaa2f2b8ff75d14c70a70de47616671e0bf5d697

robfig · 2017-11-29T21:35:48Z

Gulp, ok I'll give it a shot. Thank you

zz-pony · 2017-12-01T18:29:12Z

Will this fix be included in the next release? If so, when will be the next release? Thanks!

robfig · 2017-12-01T18:44:03Z

FYI, here's the version I built.
https://github.com/yext/bazel/releases/tag/0.8.1

zz-pony · 2017-12-01T18:51:31Z

Cool. Let me try it out. Thanks again!

zz-pony · 2017-12-09T17:17:41Z

Looks like the newest release 0.8.1 didn't include the fix? So sad...

mmorearty · 2017-12-15T07:51:02Z

For others who are tracking this, it looks like the above-mentioned commits (well, cherry-picks of them) are in the release-0.9.0 branch, so I guess 0.9.0 will have this fix.

damienmg added category: service APIs P1 I'll work on this now. (Assignee required) type: bug labels Nov 8, 2017

ulfjack self-assigned this Nov 28, 2017

bazel-io closed this as completed in 4d7f8f7 Nov 29, 2017

mmorearty mentioned this issue Dec 15, 2017

When using --experimental_remote_spawn_cache, runfiles are not restored correctly #4305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bazel run` when using `--experimental_remote_spawn_cache` is broken #4041

`bazel run` when using `--experimental_remote_spawn_cache` is broken #4041

tybook commented Nov 7, 2017 •

edited

Loading

tybook commented Nov 7, 2017

damienmg commented Nov 8, 2017

hhclam commented Nov 14, 2017

hhclam commented Nov 14, 2017 •

edited

Loading

ulfjack commented Nov 28, 2017

robfig commented Nov 29, 2017 •

edited

Loading

hhclam commented Nov 29, 2017

robfig commented Nov 29, 2017

zz-pony commented Dec 1, 2017

robfig commented Dec 1, 2017

zz-pony commented Dec 1, 2017

zz-pony commented Dec 9, 2017

mmorearty commented Dec 15, 2017

bazel run when using --experimental_remote_spawn_cache is broken #4041

bazel run when using --experimental_remote_spawn_cache is broken #4041

Comments

tybook commented Nov 7, 2017 • edited Loading

Description of the problem / feature request / question:

If possible, provide a minimal example to reproduce the problem:

Environment info

Have you found anything relevant by searching the web?

Anything else, information or logs or outputs that would be helpful?

tybook commented Nov 7, 2017

damienmg commented Nov 8, 2017

hhclam commented Nov 14, 2017

hhclam commented Nov 14, 2017 • edited Loading

ulfjack commented Nov 28, 2017

robfig commented Nov 29, 2017 • edited Loading

hhclam commented Nov 29, 2017

robfig commented Nov 29, 2017

zz-pony commented Dec 1, 2017

robfig commented Dec 1, 2017

zz-pony commented Dec 1, 2017

zz-pony commented Dec 9, 2017

mmorearty commented Dec 15, 2017

`bazel run` when using `--experimental_remote_spawn_cache` is broken #4041

`bazel run` when using `--experimental_remote_spawn_cache` is broken #4041

tybook commented Nov 7, 2017 •

edited

Loading

hhclam commented Nov 14, 2017 •

edited

Loading

robfig commented Nov 29, 2017 •

edited

Loading