Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel symlink prefetching can crash - triggered by Builds without the Bytes #18772

Closed
alexofortune opened this issue Jun 26, 2023 · 4 comments
Closed
Assignees
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@alexofortune
Copy link

alexofortune commented Jun 26, 2023

Description of the bug:

We're trying to adopt Builds without the Bytes at our codebase, but we see build failures when we enable it related to strange symlink treatment. Here's the error and stack trace:

Exec failed due to IOException: 136 errors during bulk transfer:
java.io.IOException: /mnt/cache/bazel/output/652855c435821699db0f4352be3bd5fd/execroot/__main__/bazel-out/k8-fastbuild/bin/external/com_git_scm_git/git-add/git (Not a directory)
		at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
		at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:519)
		at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
		at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
		at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:408)
		at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:396)
		at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:457)
		at com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.finalizeDownload(AbstractActionInputPrefetcher.java:547)
		at com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.lambda$downloadFileNoCheckRx$14(AbstractActionInputPrefetcher.java:475)
		at io.reactivex.rxjava3.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:107)
		... 85 more

Upon further investigation, we found this culprit block of code at https://github.com/bazelbuild/bazel/blob/d097b5d6cd3bc9fdb725b379b6cf3ef247126008/src/main/java/com/google/devtools/build/lib/remote/AbstractActionInputPrefetcher.java , line 437 :

    if (path.isSymbolicLink()) {
      try {
        path = path.getRelative(path.readSymbolicLink());
      } catch (IOException e) {
        return Completable.error(e);
      }
    }

Instrumenting this with logs yielded following results:

230626 10:35:11.852:I 556 [com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.downloadFileNoCheckRx] [2] prefetching a file - original path /mnt/cache/bazel/output/652855c435821699db0f4352be3bd5fd/execroot/__main__/bazel-out/k8-fastbuild/bin/external/com_git_scm_git/git-maintenance
230626 10:35:11.852:I 556 [com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.downloadFileNoCheckRx] [2] prefetching a file - transformed path /mnt/cache/bazel/output/652855c435821699db0f4352be3bd5fd/execroot/__main__/bazel-out/k8-fastbuild/bin/external/com_git_scm_git/git-maintenance/git

To be precise:

  • git is standard git binary
  • git-maintenance is a symlink to git, generated via a bash script via genrule, and declared via outs. Note that this genrule is marked as local = True

This results in resolving the download path to a directory that doesn't exist and IOException when Bazel tries to attempt moving.

At least three questions come to my mind:

  • Why bazel decides to prefetch something that is already on local filesystem? ( At the time of the crash, I can see both git binary and the symlinks )
  • Why does bazel try to even check if the path is a symbolic link before it attempts to pre-fetch the file?
  • Why does bazel think its a good idea to interpret that path as a directory and append relative path to it, resulting in this bogus path?

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Something akin to this should work:

a) Create a genrule:

SYMLINKS = [
   "binary-symlink",
   "binary-symlink2",
]
genrule(
    name = "symlinks",
    srcs = [":binary"],
    outs = SYMLINKS,
    cmd = ";".join([
               "ln -s binary $(location %s)" % f
               for f in SYMLINKS
     ]),
     local = True,
)

b) Add an action depending on the symlinks. Try to build it with BWOB.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

6.1.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

Nothing

Any other information, logs, or outputs that you want to share?

No response

@meisterT
Copy link
Member

Can you try with Bazel from head? We have made plenty of fixes in the meantime. @coeuvre

@iancha1992 iancha1992 added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Jun 26, 2023
@coeuvre
Copy link
Member

coeuvre commented Jun 29, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jun 29, 2023
@coeuvre
Copy link
Member

coeuvre commented Jun 29, 2023

@bazel-io fork 6.3.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jun 29, 2023
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Jun 29, 2023
When prefetching non-declared symlink for local actions, we want to download the target artifact if they haven't been downloaded.

Currently, the code use `path.getRelativePath(path.readSymbolicLink())` to mimic resolving relative symlink which is not correct. Replacing it with `path.readSymbolicLink()`.

Fixes bazelbuild#18772.

PiperOrigin-RevId: 544331900
Change-Id: Ie2a6bac298ab9f81e44d5f505f1b3d83519ba3ca
iancha1992 added a commit that referenced this issue Jun 30, 2023
When prefetching non-declared symlink for local actions, we want to download the target artifact if they haven't been downloaded.

Currently, the code use `path.getRelativePath(path.readSymbolicLink())` to mimic resolving relative symlink which is not correct. Replacing it with `path.readSymbolicLink()`.

Fixes #18772.

PiperOrigin-RevId: 544331900
Change-Id: Ie2a6bac298ab9f81e44d5f505f1b3d83519ba3ca

Co-authored-by: Googler <chiwang@google.com>
@alexofortune
Copy link
Author

Wow that was quick, thank you @coeuvre and @iancha1992 !

euroelessar pushed a commit to dropbox/dbx_build_tools that referenced this issue Aug 11, 2023
Summary: Create symlinks in git repo the proper way. Previously we used a genrule and a ln -s command, and Bazel has not seen the resulting symlinks as, well, symlinks. This could in particular confuse Bazel's prefetcher used by Builds without the Bytes and cause build failures due to buggy Bazel behaviour ( see bazelbuild/bazel#18772
GitOrigin-RevId: e7fbe989366c7dbf9f31a7219a10990e326600f0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

8 participants