Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel run when using --experimental_remote_spawn_cache is broken #4041

Closed
tybook opened this issue Nov 7, 2017 · 13 comments
Closed

bazel run when using --experimental_remote_spawn_cache is broken #4041

tybook opened this issue Nov 7, 2017 · 13 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) type: bug

Comments

@tybook
Copy link

tybook commented Nov 7, 2017

Description of the problem / feature request / question:

bazel run, of at least sh_binary and java_binary targets, does not work when the target is fetched from the remote cache. The runfiles directory for the target is not recreated. I get output like:

$ bazel run //:script
INFO: Analysed target //:script (0 packages loaded).
INFO: Found 1 target...
Target //:script up-to-date:
  bazel-bin/script
INFO: Elapsed time: 0.262s, Critical Path: 0.01s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/script
ERROR: Error running program: java.io.IOException: Cannot run program "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/_bin/process-wrapper" (in directory "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/bazel-out/k8-fastbuild/bin/script.runfiles/bazel_run_bug"): error=2, No such file or directory

This problem does not seem to exist for executable genrules. This problem occurs both with and without using the --script_path flag. We use bazel run commands for significant portions of our deployment process, so this is blocking our adoption of remote caching.

If possible, provide a minimal example to reproduce the problem:

==> WORKSPACE <==
workspace(name='bazel_run_bug')

==> .bazelrc <==
# Remote caching
startup --host_jvm_args=-Dbazel.DigestFunction=SHA1
build --experimental_remote_spawn_cache
build --remote_rest_cache=http://REMOTE.CACHE.URL
build --experimental_strict_action_env

==> BUILD <==
sh_binary(
    name="script",
    srcs=["script.sh"],
)

==> script.sh <==
#!/bin/bash
echo "Success"

Start with cold local and remote caches.
Run bazel run //:script and notice that it succeeds, i.e. its output is:

$ bazel run //:script
INFO: Analysed target //:script (12 packages loaded).
INFO: Found 1 target...
Target //:script up-to-date:
  bazel-bin/script
INFO: Elapsed time: 0.934s, Critical Path: 0.11s
INFO: Build completed successfully, 4 total actions

INFO: Running command line: bazel-bin/script
Success

Clean the local cache with bazel clean, but leave the remote cache warmed.
Run bazel run //:script again and notice that it fails, i.e. its output is:

$ bazel run //:script
INFO: Analysed target //:script (0 packages loaded).
INFO: Found 1 target...
Target //:script up-to-date:
  bazel-bin/script
INFO: Elapsed time: 0.262s, Critical Path: 0.01s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/script
ERROR: Error running program: java.io.IOException: Cannot run program "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/_bin/process-wrapper" (in directory "/home/ty/.cache/bazel/_bazel_ty/0662a860837afcfc3e6969c278958249/execroot/bazel_run_bug/bazel-out/k8-fastbuild/bin/script.runfiles/bazel_run_bug"): error=2, No such file or directory

Environment info

  • Operating System:
    Ubuntu 16.04

  • Bazel version (output of bazel info release):
    Release 0.7.0 and built from HEAD (commit 67c84b1036ad02ba2384fa75fb28e779a488f3d4 on Nov 6, 2017)

Have you found anything relevant by searching the web?

#3934 looked maybe related, but this problem is still occurring with Bazel built from HEAD, so apparently it isn't. It might be the same problem as this: https://groups.google.com/forum/#!topic/bazel-discuss/O-pockbTV8M

Anything else, information or logs or outputs that would be helpful?

(If they are large, please upload as attachment or provide link).

@tybook
Copy link
Author

tybook commented Nov 7, 2017

@hhclam We talked about this briefly at Bazel conf

@damienmg damienmg added category: service APIs P1 I'll work on this now. (Assignee required) type: bug labels Nov 8, 2017
@damienmg
Copy link
Contributor

damienmg commented Nov 8, 2017

/cc @ulfjack @buchgr

@hhclam
Copy link
Contributor

hhclam commented Nov 14, 2017

Saw this problem too. Will look.

@hhclam
Copy link
Contributor

hhclam commented Nov 14, 2017

I verified that master still has this issue.

Also I verified that --spawn_strategy=remote --remote_rest_cache works with both 0.7.0 and master. This seems to be a problem with --experimental_remote_spawn_cache only.

@ulfjack
Copy link
Contributor

ulfjack commented Nov 28, 2017

I have a fix.

@ulfjack ulfjack self-assigned this Nov 28, 2017
ochafik pushed a commit to ochafik/bazel that referenced this issue Nov 29, 2017
- remove BaseSpawn.Local; instead, all callers pass in the full set of
  execution requirements they want to set
- disable caching and sandboxing for the symlink tree action - it does not
  declare outputs, so it can't be cached or sandboxed (fixes bazelbuild#4041)
- centralize the existing execution requirements in the ExecutionRequirements
  class
- centralize checking for execution requirements in the Spawn class
  (it's possible that we may need a more decentralized, extensible design in
  the future, but for now having them in a single place is simple and
  effective)
- update the documentation
- forward the relevant tags to execution requirements in TargetUtils (progress
  on bazelbuild#3960)
- this also contributes to bazelbuild#4153

PiperOrigin-RevId: 177288598
@robfig
Copy link

robfig commented Nov 29, 2017

We ran into this when rolling out remote cache, and it's causing some havoc. Any chance of a 0.8.1 release in the near term including this fix, or should I build my own to hold us over to 0.9.0?

@hhclam
Copy link
Contributor

hhclam commented Nov 29, 2017

If you feel brave you can cherry pick the following changes..
7967f33
72cbef7
4d7f8f7

ianoc-stripe pushed a commit to ianoc-stripe/bazel that referenced this issue Nov 29, 2017
Instead use SimpleSpawn. Also set the execution requirements properly - in
particular, we need to disable caching and sandboxing for these spawns.

Fixes bazelbuild#4041.

PiperOrigin-RevId: 177132445
Change-Id: Iaaa2f2b8ff75d14c70a70de47616671e0bf5d697
@robfig
Copy link

robfig commented Nov 29, 2017

Gulp, ok I'll give it a shot. Thank you

@zz-pony
Copy link

zz-pony commented Dec 1, 2017

Will this fix be included in the next release? If so, when will be the next release? Thanks!

@robfig
Copy link

robfig commented Dec 1, 2017

FYI, here's the version I built.
https://github.com/yext/bazel/releases/tag/0.8.1

@zz-pony
Copy link

zz-pony commented Dec 1, 2017

Cool. Let me try it out. Thanks again!

@zz-pony
Copy link

zz-pony commented Dec 9, 2017

Looks like the newest release 0.8.1 didn't include the fix? So sad...

@mmorearty
Copy link
Contributor

For others who are tracking this, it looks like the above-mentioned commits (well, cherry-picks of them) are in the release-0.9.0 branch, so I guess 0.9.0 will have this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) type: bug
Projects
None yet
Development

No branches or pull requests

7 participants