Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel CI: Bazel@HEAD breaks Bazel Examples (Rules) in downstream pipeline #14361

Closed
meteorcloudy opened this issue Dec 1, 2021 · 9 comments
Closed
Assignees
Labels
breakage P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug

Comments

@meteorcloudy
Copy link
Member

https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/2242#9648195b-7d68-4461-9c9c-8b1a0d7ee23b4

(04:09:20) FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'BuildConfigurationKey[6bc874ab18159c5fcf54e408a7b0a97168669790f98a50bae098e9e45af2d17c]' (requested by nodes 'ConfiguredTargetKey{label=@local_config_platform//:host, config=BuildConfigurationKey[31e38dff82a2d18de037470e839f3ecbbfbb04890544ca63af7daa0d843f0f2e]}')
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:658)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
	at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.NullPointerException
	at com.google.devtools.build.lib.analysis.starlark.FunctionTransitionUtil.computeOutputDirectoryNameFragment(FunctionTransitionUtil.java:433)
	at com.google.devtools.build.lib.analysis.config.BuildConfigurationValue.<init>(BuildConfigurationValue.java:178)
	at com.google.devtools.build.lib.skyframe.BuildConfigurationFunction.compute(BuildConfigurationFunction.java:101)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:587)
	... 7 more

/cc @sdtwigg I can confirm 711c44e is the culprit and b371a98 did not fix the issue.

@meteorcloudy meteorcloudy added type: bug P1 I'll work on this now. (Assignee required) breakage team-Configurability platforms, toolchains, cquery, select(), config transitions labels Dec 1, 2021
@meteorcloudy
Copy link
Member Author

To reproduce

git clone https://github.com/bazelbuild/examples.git
cd examples/rules
export USE_BAZEL_VERSION=711c44ea79db14406972cc2204a43c589a2debce
bazelisk build  -- ... -//starlark_configurations/read_attr_in_transition:will-break -//starlark_configurations/cc_binary_selectable_copts:app_forgets_to_set_features -//starlark_configurations/cc_binary_selectable_copts:app_forgets_to_set_features_native_binary -//starlark_configurations/cc_binary_selectable_copts:app_with_feature1_native_binary -//starlark_configurations/cc_binary_selectable_copts:app_with_feature2_native_binary -//starlark_configurations/cc_binary_selectable_copts:lib -//starlark_configurations/multi_arch_binary:foo

@fmeum
Copy link
Collaborator

fmeum commented Dec 1, 2021

@sdtwigg We saw this before in #13915 (comment).

@fmeum
Copy link
Collaborator

fmeum commented Dec 2, 2021

The crashing target //starlark_configurations/cc_test:my-test has a Starlark rule with test = True depending on a cc_test through a split transition that sets //command_line_option:test_arg. The build causes a crash when FunctionTransitionUtil#computeOutputDirectoryNameFragment attempts to access test_arg, which at that point has been trimmed from the configuration and thus isn't contained in the optionInfoMap anymore, which leads to a NullPointerException. The build passes with --notrim_test_configuration. Maybe FunctionTransitionUtil has to be taught to handle trimmed configurations gracefully while computing the ST-hash?

What I don't understand is why there is any trimming at all in this scenario: The two rules involved in the crashing example are transition_rule_test (which is labeled with test = True) and cc_test. It surprises me that computeOutputDirectoryNameFragment is called for a BuildOptions with affected by starlark transition containing test_arg and a trimmed test configuration in this setting.

@meteorcloudy
Copy link
Member Author

ping @sdtwigg

@sdtwigg
Copy link
Contributor

sdtwigg commented Dec 7, 2021

Transitioning on the options in TestOptions is really weird. They can't affect the actual test actions because those are consumed only when those are top-level targets (even being in a test_suite causes special logic to come into play and test_suite has no transitions anyway). So, in a real scenario, this would at best change select statement resolution (but potentially problematically because the actual executed tests will be with the command-line determined test_arg!)

I will make FunctionTransitionUtil more resilient to a trimmed option missing; however, this test should be changed to transition on a different option. Ultimately, I may just implement a complete ban on any transitions affecting TestOptions.

@sdtwigg
Copy link
Contributor

sdtwigg commented Dec 7, 2021

PS: In other tests, I was replacing --test_arg with --platform_suffix transitions, which affect the output directory name at least.

@meteorcloudy
Copy link
Member Author

@sdtwigg Sounds good, do I understand correctly that you are going to make some tweaks in Bazel, and can you also help fixing our examples (https://github.com/bazelbuild/examples/tree/main/rules/starlark_configurations)?

meteorcloudy added a commit to bazelbuild/examples that referenced this issue Dec 16, 2021
This will skip testing those jobs with Bazel@HEAD in downstream pipeline.
We can re-enable them in downstream when bazelbuild/bazel#14361 is fixed.
c-parsons pushed a commit to bazelbuild/examples that referenced this issue Jan 5, 2022
* Set specific bazel version number for "Bazel Rules" job

This will skip testing those jobs with Bazel@HEAD in downstream pipeline.
We can re-enable them in downstream when bazelbuild/bazel#14361 is fixed.

* Use "latest"
@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Feb 15, 2022
@meteorcloudy
Copy link
Member Author

Bazel examples is still broken with Bazel@HEAD, but I'll deprioritize this since it's disabled in downstream now.

@meteorcloudy
Copy link
Member Author

I believe this is fixed by bazelbuild/examples@7fc3f8b

meteorcloudy added a commit to bazelbuild/continuous-integration that referenced this issue Oct 17, 2022
Bazel Examples are now fixed with Bazel@HEAD: bazelbuild/bazel#14361
meteorcloudy added a commit to bazelbuild/continuous-integration that referenced this issue Oct 17, 2022
Bazel Examples are now fixed with Bazel@HEAD:
bazelbuild/bazel#14361
fmeum pushed a commit to fmeum/continuous-integration that referenced this issue Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breakage P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug
Projects
None yet
Development

No branches or pull requests

3 participants