Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spuriously breakage in Gerrit CI after upgrading from 7.0.0rc2 to 7.0.0rc3 #20161

Open
davido opened this issue Nov 13, 2023 · 30 comments
Open
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: documentation (cleanup)

Comments

@davido
Copy link
Contributor

davido commented Nov 13, 2023

Description of the bug:

Gerrit Code Review is in process of upgrading to bazel 7.0.0.

All was fine after the upgrade to 7.0.0rc2, see: [1].

However, after upgrading to the 7.0.0rc3 we started to see this breakage on our CI:

https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-chrome-latest/40214/console

INFO: Invocation ID: 93ef2f32-774e-40ce-b58d-24dd7a30b758
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /home/jenkins/workspace/Gerrit-verifier-chrome-latest/gerrit/.bazelrc:
  'build' options: --noenable_bzlmod --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=17 --java_runtime_version=remotejdk_17 --tool_java_language_version=17 --tool_java_runtime_version=remotejdk_17 --incompatible_strict_action_env --announce_rc
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Analyzing: target //tools/maven:gen_api_install (1 packages loaded, 0 targets configured)
Analyzing: target //tools/maven:gen_api_install (1 packages loaded, 0 targets configured)
[0 / 1] [Prepa] BazelWorkspaceStatusAction stable-status.txt
INFO: Analyzed target //tools/maven:gen_api_install (1 packages loaded, 1 target configured).
[368 / 527] Executing genrule @jgit//org.eclipse.jgit:jgit; 1s remote-cache, linux-sandbox
[369 / 527] [Prepa] Compiling Java headers external/jgit/org.eclipse.jgit.ssh.apache/libssh-apache-hjar.jar (53 source files)
ERROR: /home/jenkins/workspace/Gerrit-verifier-chrome-latest/gerrit/java/com/google/gerrit/jgit/BUILD:3:13: Compiling Java headers java/com/google/gerrit/jgit/libjgit-hjar.jar (1 source file) failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: cf3b2439c36619f2b6aaadddc55f15ddfd0c96566d22c1c507823ca74ac09732/127311204 for bazel-out/k8-fastbuild/bin/external/rules_java_builtin/toolchains/platformclasspath.jar
ERROR: /home/jenkins/workspace/Gerrit-verifier-chrome-latest/gerrit/java/com/google/gerrit/jgit/BUILD:3:13: Building java/com/google/gerrit/jgit/libjgit.jar (1 source file) failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: cf3b2439c36619f2b6aaadddc55f15ddfd0c96566d22c1c507823ca74ac09732/127311204 for bazel-out/k8-fastbuild/bin/external/rules_java_builtin/toolchains/platformclasspath.jar
Target //tools/maven:gen_api_install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 6.088s, Critical Path: 4.95s
INFO: 24 processes: 23 internal, 1 linux-sandbox.
ERROR: Build did NOT complete successfully
bazelisk failed to build gen_api_install. Use VERBOSE=1 for more info
Build step 'Execute shell' marked build as failure
Finished: FAILURE

If I downgrade to 7.0.0.rc2, then the build is successful again: [1]

[1] https://gerrit-review.googlesource.com/c/gerrit/+/391534

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I cannot currently reproduce the problem locally ;-(

This command is invoked on the CI:

  tools/maven/api.sh install

That is created a shell script and invoking it to publish Plugin API artifacts in the local maven repository.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

7.0.0rc3

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

All is fine on Bazel 7.0.0.rc2. I am unable to reproduce the problem locally and this cannot bisect.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator

fmeum commented Nov 13, 2023

Could you test with --noreuse_sandbox_directories? That's my best guess without a bisect.

@davido
Copy link
Contributor Author

davido commented Nov 13, 2023

Could you test with --noreuse_sandbox_directories? That's my best guess without a bisect.

Unfortunately, with this option the error is still present. Also, I downgraded to 7.0.0.rc2 (from 7.0.0rc3) it's still failing.

@davido
Copy link
Contributor Author

davido commented Nov 13, 2023

I also added "-s" option, and produced this verbose output:

https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-chrome-latest/40258/console

[...]
# Configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042
# Execution platform: @local_config_platform//:host
SUBCOMMAND: # //java/com/google/gerrit/git/testing:testing [action 'Building java/com/google/gerrit/git/testing/libtesting.jar (3 source files)', configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042, execution platform: @local_config_platform//:host, mnemonic: Javac]
(cd /home/jenkins/.cache/bazel/_bazel_jenkins/67bba20af71044f1eb598ecb44098f26/execroot/gerrit && \
  exec env - \
    LC_CTYPE=en_US.UTF-8 \
    PATH=/home/jenkins/.cache/bazelisk/downloads/bazelbuild/bazel-7.0.0rc2-linux-x86_64/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/usr/lib/jvm/java-11-openjdk-amd64/jre/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  external/remotejdk21_linux/bin/java '--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.resources=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED' '--add-opens=java.base/java.nio=ALL-UNNAMED' '--add-opens=java.base/java.lang=ALL-UNNAMED' '-Dsun.io.useCanonCaches=false' -XX:-CompactStrings -Xlog:disable '-Xlog:all=warning:stderr:uptime,level,tags' -jar external/remote_java_tools/java_tools/JavaBuilder_deploy.jar @bazel-out/k8-fastbuild/bin/java/com/google/gerrit/git/testing/libtesting.jar-0.params @bazel-out/k8-fastbuild/bin/java/com/google/gerrit/git/testing/libtesting.jar-1.params)
# Configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042
# Execution platform: @local_config_platform//:host
SUBCOMMAND: # //java/com/google/gerrit/jgit:jgit [action 'Building java/com/google/gerrit/jgit/libjgit.jar (1 source file)', configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042, execution platform: @local_config_platform//:host, mnemonic: Javac]
(cd /home/jenkins/.cache/bazel/_bazel_jenkins/67bba20af71044f1eb598ecb44098f26/execroot/gerrit && \
  exec env - \
    LC_CTYPE=en_US.UTF-8 \
    PATH=/home/jenkins/.cache/bazelisk/downloads/bazelbuild/bazel-7.0.0rc2-linux-x86_64/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/usr/lib/jvm/java-11-openjdk-amd64/jre/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  external/remotejdk21_linux/bin/java '--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.resources=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED' '--add-opens=java.base/java.nio=ALL-UNNAMED' '--add-opens=java.base/java.lang=ALL-UNNAMED' '-Dsun.io.useCanonCaches=false' -XX:-CompactStrings -Xlog:disable '-Xlog:all=warning:stderr:uptime,level,tags' -jar external/remote_java_tools/java_tools/JavaBuilder_deploy.jar @bazel-out/k8-fastbuild/bin/java/com/google/gerrit/jgit/libjgit.jar-0.params @bazel-out/k8-fastbuild/bin/java/com/google/gerrit/jgit/libjgit.jar-1.params)
# Configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042
# Execution platform: @local_config_platform//:host
SUBCOMMAND: # //java/com/google/gerrit/acceptance/config:config [action 'Building java/com/google/gerrit/acceptance/config/libconfig.jar (7 source files) and running annotation processors (AutoAnnotationProcessor, AutoValueProcessor, AutoOneOfProcessor)', configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042, execution platform: @local_config_platform//:host, mnemonic: Javac]
(cd /home/jenkins/.cache/bazel/_bazel_jenkins/67bba20af71044f1eb598ecb44098f26/execroot/gerrit && \
  exec env - \
    LC_CTYPE=en_US.UTF-8 \
    PATH=/home/jenkins/.cache/bazelisk/downloads/bazelbuild/bazel-7.0.0rc2-linux-x86_64/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/usr/lib/jvm/java-11-openjdk-amd64/jre/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  external/remotejdk21_linux/bin/java '--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.resources=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED' '--add-opens=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED' '--add-opens=java.base/java.nio=ALL-UNNAMED' '--add-opens=java.base/java.lang=ALL-UNNAMED' '-Dsun.io.useCanonCaches=false' -XX:-CompactStrings -Xlog:disable '-Xlog:all=warning:stderr:uptime,level,tags' -jar external/remote_java_tools/java_tools/JavaBuilder_deploy.jar @bazel-out/k8-fastbuild/bin/java/com/google/gerrit/acceptance/config/libconfig.jar-0.params @bazel-out/k8-fastbuild/bin/java/com/google/gerrit/acceptance/config/libconfig.jar-1.params)
# Configuration: f5d72005e5d4b70683fdbd12ff2cbfb779fc730d4f37f289f17efea5d0e4d042
# Execution platform: @local_config_platform//:host
ERROR: /home/jenkins/workspace/Gerrit-verifier-chrome-latest/gerrit/java/com/google/gerrit/jgit/BUILD:3:13: Compiling Java headers java/com/google/gerrit/jgit/libjgit-hjar.jar (1 source file) failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: 5cb087fa259562b09dfdb79380f82501849de07f77ea3eb52941303af7532e7e/138756716 for bazel-out/k8-fastbuild/bin/external/rules_java_builtin/toolchains/platformclasspath.jar
ERROR: /home/jenkins/.cache/bazel/_bazel_jenkins/67bba20af71044f1eb598ecb44098f26/external/jgit/org.eclipse.jgit.http.server/BUILD:5:13: Building external/jgit/org.eclipse.jgit.http.server/libjgit-servlet-class.jar (35 source files) failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: 5cb087fa259562b09dfdb79380f82501849de07f77ea3eb52941303af7532e7e/138756716 for bazel-out/k8-fastbuild/bin/external/rules_java_builtin/toolchains/platformclasspath.jar
Target //tools/maven:gen_api_install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 8.404s, Critical Path: 7.11s
INFO: 24 processes: 23 internal, 1 linux-sandbox.
ERROR: Build did NOT complete successfully
bazelisk failed to build gen_api_install. Use VERBOSE=1 for more info
Build step 'Execute shell' marked build as failure
Finished: FAILURE

@keertk keertk added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 13, 2023
@fmeum
Copy link
Collaborator

fmeum commented Nov 13, 2023

@tjgq Do you have an idea?

@fmeum
Copy link
Collaborator

fmeum commented Nov 13, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 13, 2023
@tjgq
Copy link
Contributor

tjgq commented Nov 13, 2023

@davido Do I understand it correctly that you're building with a disk cache, but not with a remote cache? Is this build clean or incremental? Do you have any sort of process that removes entries from the disk cache between builds?

@keertk
Copy link
Member

keertk commented Nov 13, 2023

@bazel-io fork 7.0.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 13, 2023
@coeuvre
Copy link
Member

coeuvre commented Nov 13, 2023

From the CI log, it seems like you are using remote cache and these errors were caused by remote cache eviction. Can you check whether adding flag --experimental_remote_cache_eviction_retries=5 resolves the issue?

@meisterT
Copy link
Member

@coeuvre how can this happen just with the local disk cache? Race between multiple workers?

@coeuvre
Copy link
Member

coeuvre commented Nov 13, 2023

I think they are using remote cache. The flag was passed with env:

[EnvInject] - Injecting as environment variables the properties content 
BAZEL_OPTS=--remote_cache=https://gerrit-ci.gerritforge.com/cache

Also, xxx remote cache hit indicates remote cache. For disk cache it would be xxx disk cache hit.

@davido
Copy link
Contributor Author

davido commented Nov 13, 2023

First of all we are using a combination of RBE and local build.

Some stuff we can only test locally. The failing part is built locally on GCP-machines.

We have both options, disc cache and remote cache, see, e.g.

BAZEL_OPTS=--remote_cache=https://gerrit-ci.gerritforge.com/cache

However, we have this hidden logic on the CI side to take remote cache out of the picture,
if .bazelversion file was changed:

if git show --diff-filter=AM --name-only --pretty="" HEAD \| grep -q .bazelversion
then
  export BAZEL_OPTS=""
fi

This is the part of the CI that was failing:

https://gerrit.googlesource.com/gerrit-ci-scripts/+/refs/heads/master/jenkins/gerrit-bazel-build.sh#35

bazelisk build $BAZEL_OPTS plugins:core release api

@lucamilanesio Are you aware of any cache evictions on the remote cache side recently?

@davido
Copy link
Contributor Author

davido commented Nov 13, 2023

So, to verify, that remote cache contributes to the problem, I upgraded (again) the Bazel version from 7.0.0rc2 to 7.0.0rc3, and uploaded a new patch set (22). As explained in my previous comment, this would skip remote cache usage and the verification was successful: [1].

I'm going to remove the changes in .bazelversion and add the option --experimental_remote_cache_eviction_retries=5, as suggested by @coeuvre .

[1] https://gerrit-review.googlesource.com/c/gerrit/+/387837/22

@davido
Copy link
Contributor Author

davido commented Nov 13, 2023

@coeuvre, adding --experimental_remote_cache_eviction_retries options fixed the build.

@meteorcloudy meteorcloudy added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Nov 14, 2023
@tjgq
Copy link
Contributor

tjgq commented Nov 14, 2023

@davido Can you confirm whether entries can spuriously disappear from your disk and/or remote cache in between builds? If they can, then you must use --experimental_remote_cache_eviction_retries, possibly in conjunction with --experimental_remote_cache_lease_extension. Otherwise, there might be a bug in Bazel.

@davido
Copy link
Contributor Author

davido commented Nov 14, 2023

@lucamilanesio Can you help to answer the @tjgq 's question?

@meteorcloudy
Copy link
Member

Since it's still unclear if this is a Bazel bug, I'll remove this bug as a release blocker for 7.0. Closing #20175.

@davido
Copy link
Contributor Author

davido commented Nov 17, 2023

@meteorcloudy Agreed. Let's close this then as not an issue.

@davido davido closed this as completed Nov 17, 2023
@lucamilanesio
Copy link

@davido Can you confirm whether entries can spuriously disappear from your disk and/or remote cache in between builds?

They cannot disappear from the local disk, however, once a day during the remote cache cleanups, they can be removed remotely. The step that is failing though did not use any remote cache: how is that possible that Bazel would assume that the cache is remote if there isn't a remote cache configured?

It looks like the local cache "remembers" that it was fed by a remote cache, because the previous step actually used a remote cache for the intial build.

If they can, then you must use --experimental_remote_cache_eviction_retries, possibly in conjunction with --experimental_remote_cache_lease_extension. Otherwise, there might be a bug in Bazel.

Well, but that isn't the case, as mentioned above.

If I add the remote cache URL in the .bazelrc for making sure that is always used in all invocations, the problem disappear. Has something changed in the remote cache management between Bazel 7.0.0-rc2 and 7.0.0-rc3?

@davido
Copy link
Contributor Author

davido commented Dec 18, 2023

Reopening the issue, as we are seeing this on Gerrit CI again and this downstream issue with priority 0 was filed: 1.

Excerpt from downstream issue:

The build steps that are executed for the validation are:

#0
export BAZEL_OPTS=--remote_cache=https://gerrit-ci.gerritforge.com/cache
#1
bazelisk build $BAZEL_OPTS plugins:core release api
#2
tools/maven/api.sh install
#3
tools/eclipse/project.py --bazel bazelisk

Only the first build command above is using remote cache, the subsequent commands don't use remote cache, and started to consistently fail on Gerrit CI after bump of Bazel version from 7.0.0-rc2 and 7.0.0-rc3.

The second command: tools/maven/api.sh is here: 2, and is actually running this build command (without remote-cache usage):

bazelisk build //tools/maven:gen_api_install

Which is failing with this error now:

com.google.devtools.build.lib.remote.common.CacheNotFoundException: Missing digest: 892c651b04360ae932e9843f7d2233e4476e5f60dd835a865fb49bf7a48f6e66/230925 for bazel-out/k8-fastbuild/bin/external/sshd-sftp/jar/_ijar/jar/sshd-sftp/jar/sshd-sftp-2.10.0-ijar.jar
Target //tools/maven:gen_api_install failed to build

@coeuvre @tjgq @meteorcloudy @fmeum
Any clue what is going on here and how can we further track it down?

In fact, passing: --experimental_remote_cache_eviction_retries=5 helps, but this is a wrong thing to do as a workaround to fix a build command, that shouldn't use remote cache in the first place, isn't it?

Also note, that if we pass the remote cache option to all three build commands above, they all succeed.

So, in both cases (with and without remote cache): we are using repository cache and disk cache, as part of the .bazelrc:

--repository_cache=~/.gerritcodereview/bazel-cache/repository --disk_cache=~/.gerritcodereview/bazel-cache/cas

^^^ Can it be somehow related?

@davido davido reopened this Dec 18, 2023
@davido
Copy link
Contributor Author

davido commented Dec 19, 2023

I can reproduce the issue locally now. As assumed, the problem is related to the disk cache.

Here are the steps:

  1. Install remote cache https://github.com/buchgr/bazel-remote
  2. I used docker image with this command:
$ docker pull buchgr/bazel-remote-cache
$ docker run -u 1000:1000 -v /path/to/cache/dir:/data \
	-p 9090:8080 -p 9092:9092 buchgr/bazel-remote-cache \
	--max_size 5
  1. Build gerrit@HEAD, currently on Bazel release 7.0.0 using the remote cache, note that disk cache is used as well:
$ bazelisk build --remote_cache=http://server:9090 plugins:core release api
  1. Wipe out the disk cache, note that the disk cache specified in gerrit/.bazelrc file is located in ~/.gerritcodereview/bazel-cache/cas
$ rm -rf ~/.gerritcodereview/bazel-cache/cas/
  1. Build the gerrit without using the remote-cache:
davido@localhost:~/projects/gerrit (master %>)$ tools/eclipse/project.py --bazel bazelisk
INFO: Invocation ID: 6084a97c-1b8d-4850-bcb1-f37c2f84fa37
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'info' from /home/davido/projects/gerrit/.bazelrc:
  Inherited 'common' options: --noenable_bzlmod
INFO: Reading rc options for 'info' from /home/davido/projects/gerrit/.bazelrc:
  Inherited 'build' options: --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=17 --java_runtime_version=remotejdk_17 --tool_java_language_version=17 --tool_java_runtime_version=remotejdk_17 --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --incompatible_strict_action_env --announce_rc
INFO: Invocation ID: 3e774f39-267e-4841-8a37-b1e2890edb39
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=147
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit/.bazelrc:
  Inherited 'common' options: --noenable_bzlmod
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit/.bazelrc:
  'build' options: --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=17 --java_runtime_version=remotejdk_17 --tool_java_language_version=17 --tool_java_runtime_version=remotejdk_17 --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --incompatible_strict_action_env --announce_rc
INFO: Analyzed target //tools/eclipse:main_classpath_collect (10 packages loaded, 182 targets configured).
INFO: Found 1 target...
Target //tools/eclipse:main_classpath_collect up-to-date:
  bazel-bin/tools/eclipse/main_classpath_collect.runtime_classpath
INFO: Elapsed time: 1.093s, Critical Path: 0.81s
INFO: 2 processes: 2 internal.
INFO: Build completed successfully, 2 total actions
INFO: Invocation ID: 578b1a90-ad9f-478b-98f4-20818be06888
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=147
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit/.bazelrc:
  Inherited 'common' options: --noenable_bzlmod
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit/.bazelrc:
  'build' options: --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=17 --java_runtime_version=remotejdk_17 --tool_java_language_version=17 --tool_java_runtime_version=remotejdk_17 --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --incompatible_strict_action_env --announce_rc
INFO: Analyzed target //tools/eclipse:autovalue_classpath_collect (0 packages loaded, 7 targets configured).
INFO: Found 1 target...
Target //tools/eclipse:autovalue_classpath_collect up-to-date:
  bazel-bin/tools/eclipse/autovalue_classpath_collect.runtime_classpath
INFO: Elapsed time: 1.111s, Critical Path: 0.69s
INFO: 2 processes: 2 internal.
INFO: Build completed successfully, 2 total actions
INFO: Invocation ID: 0c23b3fe-303d-4076-97c6-488fbf009f94
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=147
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit/.bazelrc:
  Inherited 'common' options: --noenable_bzlmod
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit/.bazelrc:
  'build' options: --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=17 --java_runtime_version=remotejdk_17 --tool_java_language_version=17 --tool_java_runtime_version=remotejdk_17 --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --incompatible_strict_action_env --announce_rc
INFO: Analyzed target //tools/eclipse:classpath (0 packages loaded, 1 target configured).
ERROR: /home/davido/projects/gerrit/proto/testing/BUILD:4:14: Generating proto_library //proto/testing:test_proto failed: Failed to fetch blobs because they do not exist remotely.: 3 errors during bulk transfer:
com.google.devtools.build.lib.remote.common.CacheNotFoundException: Missing digest: 74c97c32ccbc58b7d77ca61e6ec0d576d9f47173b3360c4f31e73a265162cd1f/4388096 for bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/com_google_protobuf/protoc
com.google.devtools.build.lib.remote.common.CacheNotFoundException: Missing digest: 74c97c32ccbc58b7d77ca61e6ec0d576d9f47173b3360c4f31e73a265162cd1f/4388096 for bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/com_google_protobuf/protoc
com.google.devtools.build.lib.remote.common.CacheNotFoundException: Missing digest: 74c97c32ccbc58b7d77ca61e6ec0d576d9f47173b3360c4f31e73a265162cd1f/4388096 for bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/com_google_protobuf/protoc
Target //tools/eclipse:classpath failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.845s, Critical Path: 0.28s
INFO: 9 processes: 4 internal, 2 linux-sandbox, 3 worker.
ERROR: Build did NOT complete successfully

@lucamilanesio
Copy link

Good catch @davido, I truly believe that Bazel keeps some local reference on the disk cache that it was populated from a remote source. When you do not specify the remote source anymore in the subsequent commands, Bazel blows up with the error you've shown, which is misleading because it isn't really a network transfer problem at all.

I wrongly assumed that we had issues with our remote cache storage, but that wasn't the case.

@coeuvre
Copy link
Member

coeuvre commented Dec 19, 2023

Thanks for the repro! I am looking into the issue now.

@coeuvre coeuvre self-assigned this Dec 19, 2023
@coeuvre coeuvre added P0 This is an emergency and more important than other current work. (Assignee required) and removed untriaged labels Dec 19, 2023
@coeuvre
Copy link
Member

coeuvre commented Dec 19, 2023

I understand the issue now. Since 7.0.0, Bazel uses --remote_download_toplevel by default which means intermediate outputs will not be downloaded during the build.

Looking at the error builds in the CI, the scenario might be:

  1. In the first build with both disk and remote cache, Bazel hit the remote cache but didn't download, e.g., bazel-out/k8-fastbuild/bin/external/sshd-sftp/jar/_ijar/jar/sshd-sftp/jar/sshd-sftp-2.10.0-ijar.jar due to --remote_download_toplevel. Both disk cache and Bazel's output tree are not populated with this file. However, the action result is downloaded and stored in the disk cache.
  2. In a following build with disk cache only, Bazel can hit the local disk cache for the action result. But when Bazel needs to download the output file (because it's an input to downstream actions), it cannot download it from the disk cache. So CacheNotFoundException.

For the repro, wiping out the disk cache could also trigger the error for the same reason: Bazel didn't download outputs during last build, when it needs the output now but fails to "download" from disk cache, it reports CacheNotFoundException.

Internally, Bazel indeed keeps some references to the disk or remote cache because when building with -remote_download_[toplevel|minimal], Bazel won't download some of the outputs. It only remember the metadata so that the outputs can be re-downloaded later.

From the CI setup, it seems that you want to populate the disk cache using remote cache during the first build. If so, I would suggest setting --remote_download_all for the first build. Otherwise, --experimental_remote_cache_eviction_retries is the right flag for this issue.

@coeuvre
Copy link
Member

coeuvre commented Dec 19, 2023

This is more like a documentation issue, not a real bug in Bazel. Downgrading the priority.

@coeuvre coeuvre added P2 We'll consider working on this in future. (Assignee optional) type: documentation (cleanup) and removed P0 This is an emergency and more important than other current work. (Assignee required) type: bug labels Dec 19, 2023
@lucamilanesio
Copy link

This is more like a documentation issue, not a real bug in Bazel. Downgrading the priority.

Should this be considered a breaking change in Bazel 7 compared to 6? I guess the default behaviour has changed in a non-backward compatible way. Thanks for the suggestions, I am adding the --remote_download_all in the initial build so that all the remote resources needed are loaded locally.

That doesn't impact our build time because we always start the build with a pre-warmed Docker image that has an initial build completed. I have actually noticed that the image built was very small compared to the previous releases, which means that a lot of data was not stored anymore locally.

I agree to downgrading to a P2.

@coeuvre
Copy link
Member

coeuvre commented Dec 19, 2023

Should this be considered a breaking change in Bazel 7 compared to 6?

Yes, it's a breaking change. It is highlighted in the release notes: https://blog.bazel.build/2023/12/11/bazel-7-release.html#build-without-the-bytes-bwob, we probably should've made it more clear that it's a breaking change.

lucamilanesio added a commit to GerritCodeReview/gerrit-ci-scripts that referenced this issue Dec 28, 2023
Bazel expects the cache to be remote for all executions of the
initial build of Gerrit, because of [1].
Failing to set the remote cache server URL would result in
a file transfer failure and therefore the failure of the whole
build.

Adding the "build $BAZEL_OPTS" in the user.bazelrc resolves
the problem.

Also add the explicit fetch of the PolyGerrit NPM repositories
which aren't automatically fetched when the workspace is refreshed
from the remote Git repository. This is mandatory for preventing
the build to fail when using the local disk cache,

[1] bazelbuild/bazel#20161

Bug: Issue 316936462
Change-Id: I4cd6b4167c4025fe05d852d16f7cea5042046787
@xiemotongye
Copy link

xiemotongye commented Jan 18, 2024

I'm trying to upgrade bazel 7.0 in our iOS project. All things work fine in bazel 6.3.2.

But when I upgraded bazel to 7.0, I also met the same issue. As mentioned above, it seems that this problem occurs when both disk and remote cache are used. But I'm pretty sure I'm not using disk cache and RBE.

Here is the outputs:

'build' options: --verbose_failures --announce_rc --apple_platform_type=ios --show_progress_rate_limit=5 --output_filter=^$ --ios_minimum_os=11.0 --macos_minimum_os=12.0 --host_macos_minimum_os=12.0 --use_top_level_targets_for_symlinks --incompatible_strict_action_env --define=apple.compress_ipa=true --experimental_cc_implementation_deps --experimental_guard_against_concurrent_changes --profile=bazel-profile --experimental_objc_include_scanning --experimental_remote_cache_compression --features=oso_prefix_is_pwd --features=layering_check --features=swift.skip_function_bodies_for_derived_files --features=swift.minimal_deps --features=swift.layering_check --features=swift.module_map_no_private_headers --remote_timeout=100s --reuse_sandbox_directories --spawn_strategy=local --genrule_strategy=local
INFO: Reading rc options for 'build' from /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/ci.bazelrc:
  'build' options: --objc_enable_binary_stripping --objc_generate_linkmap --strip=always --apple_generate_dsym --remote_local_fallback --local_cpu_resources=HOST_CPUS*.9 --features=swift.use_explicit_swift_module_map --remote_cache=http://my-remote-cache.co/ios
INFO: Found applicable config definition build:strict in file /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/rules.bazelrc: --copt=-Werror
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Analyzing: target //srcs:app (0 packages loaded, 0 targets configured)
Analyzing: target //srcs:app (0 packages loaded, 0 targets configured)
[0 / 1] [Prepa] BazelWorkspaceStatusAction stable-status.txt
INFO: Analyzed target //srcs:app (0 packages loaded, 0 targets configured).
[9,975 / 27,591] AssetCatalogCompile srcs/app-intermediates/xcassets; 4s local ... (55 actions, 1 running)
[17,013 / 30,781] AssetCatalogCompile srcs/app-intermediates/xcassets; 9s local ... (55 actions, 1 running)
[23,414 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 14s local ... (49 actions, 1 running)
[25,897 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 19s local ... (48 actions, 1 running)
[28,437 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 24s local ... (45 actions, 1 running)
[30,608 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 29s local ... (44 actions, 1 running)
[32,933 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 34s local ... (49 actions, 1 running)
ERROR: /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/srcs/BUILD:601:16: SwiftStdlibCopy srcs/app-intermediates/swiftlibs failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: f5f2f1aa89a7d08abd93a7b1a2a21a6621b01a93314b40360c5bd1c44e6e2cb3/271080288 for bazel-out/ios_arm64-opt-ios-arm64-min11.0-applebin_ios-ST-ae93c8b2d27f/bin/srcs/app_bin
ERROR: /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/srcs/BUILD:601:16: SwiftStdlibCopy srcs/app-intermediates/swiftlibs_for_swiftsupport failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: f5f2f1aa89a7d08abd93a7b1a2a21a6621b01a93314b40360c5bd1c44e6e2cb3/271080288 for bazel-out/ios_arm64-opt-ios-arm64-min11.0-applebin_ios-ST-ae93c8b2d27f/bin/srcs/app_bin
Target //srcs:app failed to build

--remote_download_all worked for me, but --experimental_remote_cache_eviction_retries=5 didn't work.
I believe it has something to do with BwoB. But I have no idea why this happened without using disk cache.

Additional notes: I'm using a no-remote tag in my top-level target:

ios_application(
    name = "app",
    ...
    tags = ["no-remote"],
)

@xiemotongye
Copy link

I'm trying to upgrade bazel 7.0 in our iOS project. All things work fine in bazel 6.3.2.

But when I upgraded bazel to 7.0, I also met the same issue. As mentioned above, it seems that this problem occurs when both disk and remote cache are used. But I'm pretty sure I'm not using disk cache and RBE.

Here is the outputs:

'build' options: --verbose_failures --announce_rc --apple_platform_type=ios --show_progress_rate_limit=5 --output_filter=^$ --ios_minimum_os=11.0 --macos_minimum_os=12.0 --host_macos_minimum_os=12.0 --use_top_level_targets_for_symlinks --incompatible_strict_action_env --define=apple.compress_ipa=true --experimental_cc_implementation_deps --experimental_guard_against_concurrent_changes --profile=bazel-profile --experimental_objc_include_scanning --experimental_remote_cache_compression --features=oso_prefix_is_pwd --features=layering_check --features=swift.skip_function_bodies_for_derived_files --features=swift.minimal_deps --features=swift.layering_check --features=swift.module_map_no_private_headers --remote_timeout=100s --reuse_sandbox_directories --spawn_strategy=local --genrule_strategy=local
INFO: Reading rc options for 'build' from /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/ci.bazelrc:
  'build' options: --objc_enable_binary_stripping --objc_generate_linkmap --strip=always --apple_generate_dsym --remote_local_fallback --local_cpu_resources=HOST_CPUS*.9 --features=swift.use_explicit_swift_module_map --remote_cache=http://my-remote-cache.co/ios
INFO: Found applicable config definition build:strict in file /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/rules.bazelrc: --copt=-Werror
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Analyzing: target //srcs:app (0 packages loaded, 0 targets configured)
Analyzing: target //srcs:app (0 packages loaded, 0 targets configured)
[0 / 1] [Prepa] BazelWorkspaceStatusAction stable-status.txt
INFO: Analyzed target //srcs:app (0 packages loaded, 0 targets configured).
[9,975 / 27,591] AssetCatalogCompile srcs/app-intermediates/xcassets; 4s local ... (55 actions, 1 running)
[17,013 / 30,781] AssetCatalogCompile srcs/app-intermediates/xcassets; 9s local ... (55 actions, 1 running)
[23,414 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 14s local ... (49 actions, 1 running)
[25,897 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 19s local ... (48 actions, 1 running)
[28,437 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 24s local ... (45 actions, 1 running)
[30,608 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 29s local ... (44 actions, 1 running)
[32,933 / 33,172] AssetCatalogCompile srcs/app-intermediates/xcassets; 34s local ... (49 actions, 1 running)
ERROR: /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/srcs/BUILD:601:16: SwiftStdlibCopy srcs/app-intermediates/swiftlibs failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: f5f2f1aa89a7d08abd93a7b1a2a21a6621b01a93314b40360c5bd1c44e6e2cb3/271080288 for bazel-out/ios_arm64-opt-ios-arm64-min11.0-applebin_ios-ST-ae93c8b2d27f/bin/srcs/app_bin
ERROR: /Volumes/workspace/grunner/builds/Hwyyfv8c/0/ios/loktar/srcs/BUILD:601:16: SwiftStdlibCopy srcs/app-intermediates/swiftlibs_for_swiftsupport failed: Failed to fetch blobs because they do not exist remotely.: Missing digest: f5f2f1aa89a7d08abd93a7b1a2a21a6621b01a93314b40360c5bd1c44e6e2cb3/271080288 for bazel-out/ios_arm64-opt-ios-arm64-min11.0-applebin_ios-ST-ae93c8b2d27f/bin/srcs/app_bin
Target //srcs:app failed to build

--remote_download_all worked for me, but --experimental_remote_cache_eviction_retries=5 didn't work. I believe it has something to do with BwoB. But I have no idea why this happened without using disk cache.

Additional notes: I'm using a no-remote tag in my top-level target:

ios_application(
    name = "app",
    ...
    tags = ["no-remote"],
)

passing --experimental_remote_downloader_local_fallback also helps

@fmeum
Copy link
Collaborator

fmeum commented Apr 30, 2024

@coeuvre Just ran into this with bazel run -c opt //src/java_tools/buildjar/java/com/google/devtools/build/java/turbine:turbine_benchmark --disk_cache=some/path, which worked in the past and only uses --disk_cache internally. It changes the value to a special directory it creates and then reproducibly runs into the "Missing digest" error. This seems like more than a documentation issue.

@luispadron
Copy link
Contributor

Just +1 that im seeing a similar issue:

11:01:10 ERROR: Foo/BUILD.bazel:11:15: Compiling Foo.c failed: unable to finalize action: Missing digest: <number>/<number> for bazel-out/ios_arm64-opt-ios-arm64-min12.0-applebin_ios-ST-<sha>/bin/path/to/Foo.d

Our setup is a bit different though as were testing with 7.1.1 and:

  • Don't use a disk cache
  • Set --remote_download_outputs="all"

How can we have issues downloading here since BwtB is disabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: documentation (cleanup)
Projects
None yet
Development

No branches or pull requests