Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant "remote cache eviction error" message #22220

Closed
BalestraPatrick opened this issue May 2, 2024 · 6 comments
Closed

Constant "remote cache eviction error" message #22220

BalestraPatrick opened this issue May 2, 2024 · 6 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@BalestraPatrick
Copy link
Member

Description of the bug:

Hello!

We've been using the various flags related to addressing remote cache evictions in multiple projects successfully. I noticed an interesting behavior in one of our repos lately after taking a closer look at our CI logs. We consistently see the same action/invocation failing with a remote cache eviction error and then recovering by itself.

We run the bazel-diff tool in a bazel run invocation, which always fails with the same missing digest bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath.jar. When I
query the remote cache for that particular digest using tools_remote, I get the correct blob returned. This means that the blob is in our remote cache, but somehow Bazel thinks it's not? I have an execution log for the specific situation as well if needed. I attached the part of the log where I could find two instances of this particular digest.

Bazel invocation log:

INFO: Invocation ID: 47418a6a-f67a-4bcb-8543-cc4d356708ce
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Analyzing: target //tools/bazel_diff:bazel_diff (0 packages loaded, 0 targets configured)
Analyzing: target //tools/bazel_diff:bazel_diff (0 packages loaded, 0 targets configured)
[0 / 1] [Prepa] BazelWorkspaceStatusAction stable-status.txt
INFO: Analyzed target //tools/bazel_diff:bazel_diff (1 packages loaded, 2276 targets configured).
ERROR: /var/.../workspace/tools/bazel_diff/BUILD.bazel:3:12: Building tools/bazel_diff/bazel_diff.jar () failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: REDACTED_DIGEST/138847674 for bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath.jar
INFO: Found 1 target...
Target //tools/bazel_diff:bazel_diff failed to build
INFO: Elapsed time: 1.627s, Critical Path: 0.02s
INFO: 2 processes: 2 internal.
ERROR: Build did NOT complete successfully
ERROR: Build failed. Not running target
Found remote cache eviction error, retrying the build...
INFO: Invocation ID: 8581020e-267f-43af-bf35-1089fd042001
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Analyzing: target //tools/bazel_diff:bazel_diff (0 packages loaded, 0 targets configured)
Analyzing: target //tools/bazel_diff:bazel_diff (0 packages loaded, 0 targets configured)
[0 / 1] [Prepa] BazelWorkspaceStatusAction stable-status.txt
INFO: Analyzed target //tools/bazel_diff:bazel_diff (0 packages loaded, 0 targets configured).
[2 / 4] JavaToolchainCompileBootClasspath external/rules_java/toolchains/platformclasspath.jar; 1s disk-cache, processwrapper-sandbox
INFO: Found 1 target...
Target //tools/bazel_diff:bazel_diff up-to-date:
    /tmp/ci_bazel_output_base/execroot/__main__/bazel-out/k8-fastbuild/bin/tools/bazel_diff/bazel_diff
    /tmp/ci_bazel_output_base/execroot/__main__/bazel-out/k8-fastbuild/bin/tools/bazel_diff/bazel_diff.jar
INFO: Elapsed time: 5.827s, Critical Path: 2.78s
INFO: 4 processes: 1 internal, 2 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 4 total actions

Execution log:

{
  "commandArgs": ["external/my_remotejdk_21_linux_x64/bin/java", "-XX:+IgnoreUnrecognizedVMOptions", "--add-exports\u003djdk.compiler/com.sun.tools.javac.api\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.platform\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.util\u003dALL-UNNAMED", "-cp", "bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath_classes", "DumpPlatformClassPath", "bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath.jar", "external/my_remotejdk_21_linux_x64"],
  "environmentVariables": [],
  "platform": {
    "properties": []
  },
  "inputs": [
  ...
  ],
  "listedOutputs": ["bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath.jar"],
  "remotable": true,
  "cacheable": true,
  "timeoutMillis": "0",
  "mnemonic": "JavaToolchainCompileBootClasspath",
  "actualOutputs": [{
    "path": "bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath.jar",
    "digest": {
      "hash": "REDACTED_DIGEST",
      "sizeBytes": "138847674",
      "hashFunctionName": "SHA-256"
    },
    "isTool": false,
    "symlinkTargetPath": ""
  }],
  "runner": "processwrapper-sandbox",
  "cacheHit": false,
  "status": "",
  "exitCode": 0,
  "remoteCacheable": true,
  "targetLabel": "@@rules_java//toolchains:platformclasspath",
  "digest": {
    "hash": "ANOTHER_HASH",
    "sizeBytes": "148",
    "hashFunctionName": "SHA256"
  }
}{
  "commandArgs": ["external/my_remotejdk_21_linux_x64/bin/java", "--add-exports\u003djdk.compiler/com.sun.tools.javac.api\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.main\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.model\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.processing\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.resources\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.tree\u003dALL-UNNAMED", "--add-exports\u003djdk.compiler/com.sun.tools.javac.util\u003dALL-UNNAMED", "--add-opens\u003djdk.compiler/com.sun.tools.javac.code\u003dALL-UNNAMED", "--add-opens\u003djdk.compiler/com.sun.tools.javac.comp\u003dALL-UNNAMED", "--add-opens\u003djdk.compiler/com.sun.tools.javac.file\u003dALL-UNNAMED", "--add-opens\u003djdk.compiler/com.sun.tools.javac.parser\u003dALL-UNNAMED", "--add-opens\u003djava.base/java.nio\u003dALL-UNNAMED", "--add-opens\u003djava.base/java.lang\u003dALL-UNNAMED", "-Dsun.io.useCanonCaches\u003dfalse", "-XX:-CompactStrings", "-Xlog:disable", "-Xlog:all\u003dwarning:stderr:uptime,level,tags", "-jar", "external/remote_java_tools/java_tools/JavaBuilder_deploy.jar", "@bazel-out/k8-fastbuild/bin/tools/bazel_diff/bazel_diff.jar-0.params", "@bazel-out/k8-fastbuild/bin/tools/bazel_diff/bazel_diff.jar-1.params"],
  "environmentVariables": [{
    "name": "PATH",
    "value": "/opt/homebrew/bin:/home/linuxbrew/.linuxbrew/bin:/usr/local/bin:/usr/bin:/bin"
  }],
  "platform": {
    "properties": []
  },
  "inputs": [{
    "path": "bazel-out/k8-fastbuild/bin/external/rules_java/toolchains/platformclasspath.jar",
    "digest": {
      "hash": "REDACTED_DIGEST",
      "sizeBytes": "138847674",
      "hashFunctionName": "SHA-256"
    },
    "isTool": false,
    "symlinkTargetPath": ""
  }, 
  ...
}

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I wasn't able to reproduce this easily just yet, but it definitely requires a remote cache with the following flags enabled:

common --experimental_remote_cache_compression
common --experimental_remote_cache_eviction_retries=1
common --experimental_remote_cache_lease_extension

The rule that always fails is also pretty straightforward:

load("@rules_java//java:defs.bzl", "java_binary")

java_binary(
    name = "bazel_diff",
    main_class = "com.bazel_diff.Main",
    runtime_deps = ["@bazel_diff//jar"],
)

And the invocation looks like this:

bazel_diff=$(mktemp)
bazel run //tools/bazel_diff --script_path="$bazel_diff"

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

7.1.1

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@acecilia
Copy link

acecilia commented May 2, 2024

👋 Is this different than #18696?

@BalestraPatrick
Copy link
Member Author

@acecilia In this specific case, Bazel is able to recover from the failure, while in #18696 it isn't, so I thought they might be different. It's possible that the culprit is the same though, and for some reason Bazel is able to recover only in some situations.

@coeuvre coeuvre added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels May 7, 2024
@coeuvre coeuvre self-assigned this May 7, 2024
@coeuvre
Copy link
Member

coeuvre commented May 13, 2024

Can you share the gRPC log, especially the ByteStreamRead calls for the platformclasspath.jar? (I realize it's hard because of #18695, but since you have the invocation ID, maybe it's possible to get it from the server side?)

It also worth trying whether this is fixed by upcoming 7.2 release which includes eda0fe4.

@coeuvre
Copy link
Member

coeuvre commented Sep 9, 2024

@bazel-io fork 7.4.0.

@coeuvre
Copy link
Member

coeuvre commented Sep 9, 2024

@bazel-io fork 7.4.0

bazel-io pushed a commit to bazel-io/bazel that referenced this issue Sep 9, 2024
At construction time, the action cache is not loaded so it's always `null`. Change it to lazily get the action cache when cleaning up it.

Fixes bazelbuild#22220.

PiperOrigin-RevId: 672522144
Change-Id: I2de8b33ab78c04a690b17cd261d18d17f8b292ab
coeuvre added a commit to coeuvre/bazel that referenced this issue Sep 9, 2024
At construction time, the action cache is not loaded so it's always `null`. Change it to lazily get the action cache when cleaning up it.

Fixes bazelbuild#22220.

PiperOrigin-RevId: 672522144
Change-Id: I2de8b33ab78c04a690b17cd261d18d17f8b292ab
github-merge-queue bot pushed a commit that referenced this issue Sep 9, 2024
At construction time, the action cache is not loaded so it's always
`null`. Change it to lazily get the action cache when cleaning up it.

Fixes #22220.

PiperOrigin-RevId: 672522144
Change-Id: I2de8b33ab78c04a690b17cd261d18d17f8b292ab

Commit
9187a7e

Co-authored-by: Googler <chiwang@google.com>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.4.0rc1. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants