Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test rules are unable to handle signals #7119

Closed
clintharrison opened this issue Jan 14, 2019 · 10 comments
Closed

Test rules are unable to handle signals #7119

clintharrison opened this issue Jan 14, 2019 · 10 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@clintharrison
Copy link
Contributor

Description of the problem / feature request:

Tests are currently unable to respond to signals unless --experimental_split_xml_generation is used.
To ensure all test.xml logs are written, test-setup.sh traps all signals to write this file with the contents of stdout, exit code, etc. This is similar to #6338 in that we do not properly propagate signals to the child test process.

Feature requests: what underlying problem are you trying to solve with this feature?

I have a test rule that starts some services prior to running tests. These services run in Docker containers under Docker for Mac, which is actually a separate Linux VM. As a result, the containers need to be explicitly shutdown, which is only possible by handling this signal.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Here's a simple test rule to repro:

TEST_SCRIPT = """#!/bin/bash
echo "Running tests"

function cleanup() {
    echo "Cleaning up"
    exit 1
}

trap "cleanup" SIGTERM

sleep 10
exit 1
"""

def impl(ctx):
    ctx.actions.write(output = ctx.outputs.executable, content = TEST_SCRIPT)

my_test = rule(implementation = impl, test = True)

Running as bazel test //:test --test_output=streamed --test_timeout=1 --local_sigkill_grace_seconds=3 we do not clean up; with --experimental_split_xml_generation we do 🙂

What operating system are you running Bazel on?

macOS 10.14

What's the output of bazel info release?

Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Wed Dec 19 12:57:09 2018 (1545224229)
Build timestamp: 1545224229
Build timestamp as int: 1545224229

Any other information, logs, or outputs that you want to share?

I asked about this in #general on the Bazel Slack, it was suggested that I create this as I'm somewhat relying on an implementation detail :)

@jin jin added untriaged team-Local-Exec Issues and PRs for the Execution (Local) team labels Jan 14, 2019
@jmmv
Copy link
Contributor

jmmv commented Jan 18, 2019

I agree that a test should be able to clean up after itself if it abruptly terminates, though there is no way to guarantee that your test will have a chance to do that from within itself.

(Imagine: if the test is stuck and Bazel sends a SIGKILL to it, the cleanup will not work. Mind you, that's why I added separate cleanup routines invoked by the runtime engine to atf.) But better grant the test some chances than none.

Looks like @ulfjack implemented the --experimental_split_xml_generation feature in 0858ae1. Ulf, what was the plan there? What would it take to take this feature out of experimental, enable it by default, and remove the test.xml generation from the setup script?

@jmmv jmmv added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jan 18, 2019
@ittaiz
Copy link
Member

ittaiz commented Jan 18, 2019 via email

@jmmv
Copy link
Contributor

jmmv commented Jan 18, 2019

I saw that... hence why I'm wondering what the plan is. Can this be implemented for other strategies? Is anyone working on it? Does it even make sense to implement it in those cases?

@ulfjack
Copy link
Contributor

ulfjack commented Jan 21, 2019

Bazel only has a single test strategy, so the work applies to all Bazel users. The primary reason behind the flag is to solve a race condition with the generation of the test.xml files (see #4608).

Blaze currently has two test strategies, and I have been working over the past year or so on merging them into one as well as matching Bazel's implementation. The flag is technically a step in the other direction, but I don't think it's problem.

The race condition is not a problem for Blaze, because missing test.xml files aren't a problem for Google's CI system, which has dedicated Blaze support - unlike Bazel, where most CI systems don't have dedicated support / rely on the presence of the test.xml files. That said, there's also no reason for Blaze to do this differently, even if it's not strictly required.

The plan is to enable the flag by default for Bazel, and I have been fixing (newly-discovered) issues over the past two weeks. I really want to close this out, so I hope this will land in the next few days.

@ulfjack
Copy link
Contributor

ulfjack commented Jan 25, 2019

I'm afraid it looks like I have to break this behavior, because it breaks signal forwarding in other cases. No, I don't completely understand it myself, yet.

@clintharrison
Copy link
Contributor Author

@ulfjack which behavior do you mean by "this behavior"? Is there an issue I can follow with the "other cases" detailed?

@ulfjack
Copy link
Contributor

ulfjack commented Jan 28, 2019

I have a pending change (https://bazel-review.googlesource.com/c/bazel/+/88094) that adds a trap to the test-setup.sh.

I am not sure which version of test-setup.sh you were testing with above (the original code has a trap, then I removed the trap in the case of experimental_split_xml_generation, now I'm adding it back in), but in my testing adding the trap is necessary for the test subprocess to receive a signal at all. If the test-setup.sh doesn't have a trap, then the shell completely ignores the sigterm, and it's also not forwarded to the subprocess.

Now, it's possible that the subprocess is also a shell, and that one's eating the signal, but that wouldn't be affected by the flag.

@ulfjack
Copy link
Contributor

ulfjack commented Jan 28, 2019

It's also possible that there are differences in shell behavior based on platform. The behavior of trap that I saw seems to be completely undocumented.

bazel-io pushed a commit that referenced this issue Feb 1, 2019
If we don't set a trap here, then bash ignores the signal, and the test
process also does not receive the signal, so the test runner has no
chance of writing a test.xml output.

However, the behavior of trap forwarding the signal to the subprocess is
not at all documented in the bash documentation, and also inconsistent
with the behavior reported in #7119.

There is a similar problem in the Java stub template reported in #6338.

This may or may not be progress on #4608.

PiperOrigin-RevId: 232035930
@jmmv jmmv added the type: bug label Feb 11, 2019
@ulfjack
Copy link
Contributor

ulfjack commented Mar 12, 2019

I tried this on my Mac today with bazel 0.23.2 and it seems to work:

$ bazel test //foo:test --test_output=streamed --test_timeout=1 --local_sigkill_grace_seconds=3
...
Running tests
Terminated: 15
Cleaning up
-- Test timed out at 2019-03-12 15:05:04 UTC --

TIMEOUT: //foo:test (Summary)
      bazel-out/darwin-fastbuild/testlogs/foo/test/test.log
...

Let me know if you're seeing otherwise.

@ulfjack ulfjack closed this as completed Mar 12, 2019
@alexshtin
Copy link

alexshtin commented May 14, 2019

It doesn't seem to respect --local_sigkill_grace_seconds parameter though. Sends SIGTERM and in 1 second or so sends SIGKILL.
Also what's about Ctrl+C? It seems that Bazel sends SIGKILL right away to the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants