Skip to content
This repository has been archived by the owner on Oct 2, 2023. It is now read-only.

Bazel CI: rules_docker still failing with Bazel@HEAD #1988

Open
meteorcloudy opened this issue Dec 22, 2021 · 18 comments
Open

Bazel CI: rules_docker still failing with Bazel@HEAD #1988

meteorcloudy opened this issue Dec 22, 2021 · 18 comments

Comments

@meteorcloudy
Copy link
Member

https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/2290#7782e19b-d082-4fa0-9ca6-ba4c6413740b

(04:03:10) ERROR: /var/lib/buildkite-agent/builds/bk-docker-9znq/bazel-downstream-projects/rules_docker/tests/container/BUILD:817:16: While resolving toolchains for target //tests/container:alpine_arch_ppc64le: no matching toolchains found for types //toolchains/docker:toolchain_type
(04:03:10) ERROR: Analysis of target '//tests/container:architecture_test' failed; build aborted:

A bisect shows the breaking change is: bazelbuild/bazel@98d376f

I suspect it has something to do with how platform transition is defined here:

_IMAGE_TRANSITION_CONSTRAINTS = [
("cpu", "@platforms//cpu"),
("os", "@platforms//os"),
]
[[
# Use a constraint value which will never be valid to prevent
# accidentally leaving the associated constraint setting unset.
constraint_value(
name = "image_transition_{}_unset".format(name),
constraint_setting = constraint_setting,
),
label_setting(
name = "image_transition_{}".format(name),
build_setting_default = ":image_transition_{}_unset".format(name),
),
] for name, constraint_setting in _IMAGE_TRANSITION_CONSTRAINTS]
platform(
name = "image_transition",
constraint_values = [
":image_transition_{}".format(name)
for name, _ in _IMAGE_TRANSITION_CONSTRAINTS
],
)

@meteorcloudy
Copy link
Member Author

/cc @brandjon Can you help advise how to fix this?

@meteorcloudy
Copy link
Member Author

FYI @uhthomas

@meteorcloudy
Copy link
Member Author

@brandjon ping, do you have any idea what's happening here?

@meteorcloudy
Copy link
Member Author

Since bazelbuild/bazel@98d376f is in Bazel 5.0, this means rules_docker has to fix this issue to be able to work with the next Bazel release.

@uhthomas
Copy link
Collaborator

I've taken a quick look.

❯ git switch -d 76c708fc979c1bfb65b4db300c654be08f096874
❯ USE_BAZEL_VERSION=98d376faeb206f14838156ce4cb305ddbfce08fa bazel test //... --toolchain_resolution_debug
...
INFO: ToolchainResolution:     Type //toolchains/docker:toolchain_type: target platform @io_bazel_rules_docker//platforms:image_transition: Rejected toolchain @docker_config//:toolchain; mismatching values: linux
INFO: ToolchainResolution:     Type //toolchains/docker:toolchain_type: target platform @io_bazel_rules_docker//platforms:image_transition: Rejected toolchain @docker_config//:toolchain; mismatching values: windows
...
INFO: ToolchainResolution:     Type //toolchains/docker:toolchain_type: target platform @io_bazel_rules_docker//platforms:image_transition: Rejected toolchain @docker_config//:toolchain; mismatching values: osx
...
INFO: ToolchainResolution:   Type //toolchains/docker:toolchain_type: target platform @io_bazel_rules_docker//platforms:image_transition: No toolchains found.

I'll take a deeper look later today to understand what's happening.

@uhthomas
Copy link
Collaborator

I'm confused.

The linked commit (bazelbuild/bazel@98d376f) is from January 2021, over a year ago. Has it only recently been merged? If not, what has now caused this problem?

Whilst debugging I found that unrelated, seemingly random, targets and tests fail. For example:

ERROR: /home/thomas/code/github.com/uhthomas/rules_docker/tests/contrib/BUILD:137:17: in container_bundle_ rule //tests/contrib:create_empty_bundle:
Traceback (most recent call last):
	File "/home/thomas/code/github.com/uhthomas/rules_docker/container/bundle.bzl", line 67, column 15, in _container_bundle_impl
		_incr_load(
	File "/home/thomas/code/github.com/uhthomas/rules_docker/container/layer_tools.bzl", line 232, column 28, in incremental_load
		run_tag = images.keys()[0]
Error: index out of range (index is 0, but sequence has 0 elements)
ERROR: Analysis of target '//tests/contrib:create_empty_bundle' failed; build aborted: Analysis of target '//tests/contrib:create_empty_bundle' failed

In regard to Docker toolchain resolution, I believe the //toolchains/docker:toolchain_type toolchains should use exec_compatible_with rather than target_compatible_with. This solves the original issue, but raises new ones like bazelbuild/bazel#8751.

I suspect that we should make a patch to disable transitioning by default as it appears that Bazel just isn't ready for it.

@meteorcloudy
Copy link
Member Author

The linked commit (bazelbuild/bazel@98d376f) is from January 2021, over a year ago. Has it only recently been merged? If not, what has now caused this problem?

Yes, the commit is very old, but not included in Bazel 4.x release (our first LTS release). 5.0 is coming out very soon and will contain this change. rules_docker is broken by this commit with Bazel@HEAD for a long time, but it's only reported here recently.

/cc @katre @gregestren Can you help with this issue?

@gregestren
Copy link

Do you have a simplest build that demonstrates the failure, aside from shown above?

A quick look tells me if bazelbuild/bazel@98d376f is causing this it'd have to be some combination of a Starlark transition being applied and a user-defined build flag that might not be defined in the same repo as where the build is happening.

Do the failures involve any flags?

The next step I'd try to diagnose is to run a bazel cquery deps(//:target_im_building) before and after . Identify the failing target's configuration hash and run bazel config <that hash>. See if any flag values are different as a result of bazelbuild/bazel@98d376f. That could help identify if bazelbuild/bazel@98d376f actually changes any configurations anywhere. If not, I don't see how bazelbuild/bazel@98d376f could cause toolchain resolution errors.

But happy to work with you to diagnose better.

@gregestren
Copy link

Whilst debugging I found that unrelated, seemingly random, targets and tests fail. For example:

In regard to Docker toolchain resolution, I believe the //toolchains/docker:toolchain_type toolchains should use exec_compatible_with rather than target_compatible_with. This solves the original issue, but raises new ones like bazelbuild/bazel#8751.

@uhthomas Are the other errors caused by the same issue? I'm seeing the same error with bazel 4.2.2:

$ bazel version
Build label: 4.2.2

$ bazel build //tests/contrib:create_empty_bundle
ERROR: /usr/home/greg/bazel/rules_docker/tests/contrib/BUILD:137:17: in container_bundle_ rule //tests/contrib:create_empty_bundle:
Traceback (most recent call last):
	File "/usr/home/greg/bazel/rules_docker/container/bundle.bzl", line 67, column 15, in _container_bundle_impl
		_incr_load(
	File "/usr/local/home/greg/bazel/rules_docker/container/layer_tools.bzl", line 232, column 28, in incremental_load
		run_tag = images.keys()[0]
Error: index out of range (index is 0, but sequence has 0 elements)
ERROR: Analysis of target '//tests/contrib:create_empty_bundle' failed; build aborted: Analysis of target '//tests/contrib:create_empty_bundle' failed

Would I expect that to work?

(I'm trying to replicate the CI command from https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/2290#7782e19b-d082-4fa0-9ca6-ba4c6413740b but haven't yet gotten docker properly set up on my machine to work with any version)

@meteorcloudy
Copy link
Member Author

@gregestren To reproduce:

docker run -it --init gcr.io/bazel-public/ubuntu1804-java11
root@4f4a89faff2e:/# mkdir workdir
root@4f4a89faff2e:/# cd workdir/
root@4f4a89faff2e:/workdir# git clone https://github.com/bazelbuild/rules_docker.git
root@4f4a89faff2e:/workdir# cd rules_docker/
root@4f4a89faff2e:/workdir/rules_docker# export USE_BAZEL_VERSION=98d376faeb206f14838156ce4cb305ddbfce08fa
root@4f4a89faff2e:/workdir/rules_docker# bazel build //tests/container:alpine_arch_ppc64le
2022/01/19 10:50:30 Using unreleased version at commit 98d376faeb206f14838156ce4cb305ddbfce08fa
2022/01/19 10:50:30 Downloading https://storage.googleapis.com/bazel-builds/artifacts/ubuntu1404/98d376faeb206f14838156ce4cb305ddbfce08fa/bazel...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
DEBUG: /root/.cache/bazel/_bazel_root/d0f48cfc39bf7313c85e758b7dac1933/external/bazel_toolchains/rules/rbe_repo/version_check.bzl:59:14:
Current running Bazel is not a release version and one was not defined explicitly in rbe_autoconfig target. Falling back to '4.0.0'
ERROR: While resolving toolchains for target //tests/container:alpine_arch_ppc64le: no matching toolchains found for types //toolchains/docker:toolchain_type
ERROR: Analysis of target '//tests/container:alpine_arch_ppc64le' failed; build aborted: no matching toolchains found for types //toolchains/docker:toolchain_type
INFO: Elapsed time: 16.516s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (43 packages loaded, 205 targets configured)

If you set USE_BAZEL_VERSION to bazelbuild/bazel@98d376f 's parent commit, it works:

root@4f4a89faff2e:/workdir/rules_docker# export USE_BAZEL_VERSION=6d8f0671cb0c9456d2a95d8a54fcd0453854b255
root@4f4a89faff2e:/workdir/rules_docker# bazel build //tests/container:alpine_arch_ppc64le
2022/01/19 10:51:26 Using unreleased version at commit 6d8f0671cb0c9456d2a95d8a54fcd0453854b255
2022/01/19 10:51:26 Downloading https://storage.googleapis.com/bazel-builds/artifacts/ubuntu1404/6d8f0671cb0c9456d2a95d8a54fcd0453854b255/bazel...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
DEBUG: /root/.cache/bazel/_bazel_root/d0f48cfc39bf7313c85e758b7dac1933/external/bazel_toolchains/rules/rbe_repo/version_check.bzl:59:14:
Current running Bazel is not a release version and one was not defined explicitly in rbe_autoconfig target. Falling back to '4.0.0'
INFO: Analyzed target //tests/container:alpine_arch_ppc64le (113 packages loaded, 7302 targets configured).
INFO: Found 1 target...
Target //tests/container:alpine_arch_ppc64le up-to-date:
  bazel-out/k8-fastbuild-ST-15abc339c81c/bin/tests/container/alpine_arch_ppc64le-layer.tar
INFO: Elapsed time: 24.352s, Critical Path: 1.66s
INFO: 49 processes: 17 internal, 32 processwrapper-sandbox.
INFO: Build completed successfully, 49 total actions

@gregestren
Copy link

Thanks @meteorcloudy. I'm wondering if we need to do more than

In regard to Docker toolchain resolution, I believe the //toolchains/docker:toolchain_type toolchains should use exec_compatible_with rather than target_compatible_with. This solves the original issue

from #1988 (comment)? Or at least if the remaining problems are caused by the same code?

@uhthomas
Copy link
Collaborator

Is this related?

tweag/rules_haskell#1657

It seems to be the same error message.

@meteorcloudy
Copy link
Member Author

The symptom is similar, but I'm not sure about the root cause.

@meteorcloudy
Copy link
Member Author

I haven't heard loud complaint yet, but I think this issue is preventing users to use rules_docker from Bazel 5.0

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

@github-actions github-actions bot added the Can Close? Will close in 30 days unless there is a comment indicating why not label Oct 23, 2022
@github-actions
Copy link

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

@mostynb
Copy link
Contributor

mostynb commented Dec 16, 2022

Should this be reopened?

@farcop
Copy link

farcop commented Feb 13, 2023

/reopen

@comius comius reopened this Sep 12, 2023
@github-actions github-actions bot removed the Can Close? Will close in 30 days unless there is a comment indicating why not label Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants