Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when using execution log and flaky_test_attempts if test fails #12510

Closed
mariusgrigoriu opened this issue Nov 18, 2020 · 5 comments
Closed
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug

Comments

@mariusgrigoriu
Copy link

mariusgrigoriu commented Nov 18, 2020

Description of the problem / feature request:

Bazel crashes when executing a failing test with both flaky_test_attempts > 1 and either execution log flags.

$ bazel test :always_fail  --execution_log_json_file=log.json --flaky_test_attempts=2
INFO: Analyzed target //repro:always_fail (28 packages loaded, 325 targets configured).
INFO: Found 1 test target...
FAIL: //repro:always_fail (see /private/var/tmp/_bazel_marius/c39204db2cb7e0a79ba2096a64d88d76/execroot/platform_bootstrap/bazel-out/darwin-fastbuild/testlogs/repro/always_fail/test_attempts/attempt_1.log)
FAIL: //repro:always_fail (see /private/var/tmp/_bazel_marius/c39204db2cb7e0a79ba2096a64d88d76/execroot/platform_bootstrap/bazel-out/darwin-fastbuild/testlogs/repro/always_fail/test.log)

FAILED: //repro:always_fail (Summary)
      /private/var/tmp/_bazel_marius/c39204db2cb7e0a79ba2096a64d88d76/execroot/platform_bootstrap/bazel-out/darwin-fastbuild/testlogs/repro/always_fail/test.log
      /private/var/tmp/_bazel_marius/c39204db2cb7e0a79ba2096a64d88d76/execroot/platform_bootstrap/bazel-out/darwin-fastbuild/testlogs/repro/always_fail/test_attempts/attempt_1.log
Target //repro:always_fail up-to-date:
  bazel-bin/repro/always_fail
INFO: Elapsed time: 2.406s, Critical Path: 0.39s
INFO: 3 processes: 4 darwin-sandbox.
INFO: Build completed, 1 test FAILED, 3 total actions
//repro:always_fail                                                      FAILED in 2 out of 2 in 0.2s
  Stats over 2 runs: max = 0.2s, min = 0.1s, avg = 0.1s, dev = 0.1s
  /private/var/tmp/_bazel_marius/c39204db2cb7e0a79ba2096a64d88d76/execroot/platform_bootstrap/bazel-out/darwin-fastbuild/testlogs/repro/always_fail/test.log
  /private/var/tmp/_bazel_marius/c39204db2cb7e0a79ba2096a64d88d76/execroot/platform_bootstrap/bazel-out/darwin-fastbuild/testlogs/repro/always_fail/test_attempts/attempt_1.log

Executed 1 out of 1 test: 1 fails locally.
WARNING: Execution log might not have been populated. Raw execution log is at /var/folders/7n/fxj9ffxs029ckk6bbwnw0wvm0000gn/T/exec6302060014930229270.log
Internal error thrown during build. Printing stack trace: java.lang.IllegalArgumentException
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128)
        at com.google.devtools.build.lib.bazel.execlog.StableSort.stableSort(StableSort.java:72)
        at com.google.devtools.build.lib.bazel.execlog.StableSort.stableSort(StableSort.java:60)
        at com.google.devtools.build.lib.bazel.SpawnLogModule.afterCommand(SpawnLogModule.java:163)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:626)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:615)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:235)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:546)
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:611)
        at io.grpc.Context$1.run(Context.java:605)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
java.lang.IllegalArgumentException
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128)
        at com.google.devtools.build.lib.bazel.execlog.StableSort.stableSort(StableSort.java:72)
        at com.google.devtools.build.lib.bazel.execlog.StableSort.stableSort(StableSort.java:60)
        at com.google.devtools.build.lib.bazel.SpawnLogModule.afterCommand(SpawnLogModule.java:163)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:626)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:615)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:235)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:546)
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:611)
        at io.grpc.Context$1.run(Context.java:605)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

BUILD file:

sh_test(
    name = "always_fail",
    srcs = ["test.sh"],
)

test.sh:

exit 1
bazel test :always_fail  --execution_log_json_file=log.json --flaky_test_attempts=2

What operating system are you running Bazel on?

MacOS and Linux

What's the output of bazel info release?

release 3.7.0

Other

Is this a recurrence of #8364 ?

@benjaminp
Copy link
Collaborator

This assumption is just wrong when there are test retries:

// Within a single build, each output can only be produced by a single spawn
Preconditions.checkArgument(!outputProducer.containsKey(name));

@mariusgrigoriu
Copy link
Author

I'm not familiar with the codebase. What is the impact if we allow the same output path to appear multiple times to represent that it was executed multiple times?

@jin jin added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug untriaged team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels Dec 4, 2020
@meisterT meisterT added team-Performance Issues for Performance teams P2 We'll consider working on this in future. (Assignee optional) and removed team-Local-Exec Issues and PRs for the Execution (Local) team untriaged labels Feb 10, 2021
@HoloRin
Copy link

HoloRin commented Jul 28, 2021

#13761 seems to be another instance of this issue

HoloRin added a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 28, 2021
This reverts commit 4b210bd.

Unfortunately this seems to increase the chances that we hit
bazelbuild/bazel#12510 so I am reverting it
for now
HoloRin added a commit to rabbitmq/rabbitmq-server that referenced this issue Jul 28, 2021
This reverts commit 4b210bd.

Unfortunately this seems to increase the chances that we hit
bazelbuild/bazel#12510 so I am reverting it
for now

(cherry picked from commit 966d9e3)
bazel-io pushed a commit that referenced this issue Aug 30, 2021
As reported in #12510 there might be multiple spawns with the same output when flaky tests are executed multiple times.

The fix is to remove the check for presence of duplicate outputs and to extend the sorting algorithm to accept multiple outputs with the same name.

Closes #13650.

PiperOrigin-RevId: 393750094
@styurin
Copy link
Contributor

styurin commented Aug 30, 2021

This can be closed, #13650 fixed this issue.

cc @meisterT

@meisterT
Copy link
Member

Thanks for the fix @styurin

mattem added a commit to aspect-build/bazel that referenced this issue Nov 17, 2021
As reported in bazelbuild#12510 there might be multiple spawns with the same output when flaky tests are executed multiple times.

The fix is to remove the check for presence of duplicate outputs and to extend the sorting algorithm to accept multiple outputs with the same name.

Cherry-pick of e58dd7e from bazelbuild@e58dd7e
bcmyers pushed a commit to bcmyers/bazel that referenced this issue Jan 27, 2022
As reported in bazelbuild#12510 there might be multiple spawns with the same output when flaky tests are executed multiple times.

The fix is to remove the check for presence of duplicate outputs and to extend the sorting algorithm to accept multiple outputs with the same name.

Cherry-pick of e58dd7e from bazelbuild@e58dd7e
alexeagle pushed a commit to aspect-build/bazel that referenced this issue Jan 27, 2022
As reported in bazelbuild#12510 there might be multiple spawns with the same output when flaky tests are executed multiple times.

The fix is to remove the check for presence of duplicate outputs and to extend the sorting algorithm to accept multiple outputs with the same name.

Cherry-pick of e58dd7e from bazelbuild@e58dd7e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug
Projects
None yet
Development

No branches or pull requests

6 participants