Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java_stub_template.txt leaves orphan Java process when terminated #19036

Open
JaredNeil opened this issue Jul 24, 2023 · 5 comments
Open

java_stub_template.txt leaves orphan Java process when terminated #19036

JaredNeil opened this issue Jul 24, 2023 · 5 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: bug

Comments

@JaredNeil
Copy link
Contributor

JaredNeil commented Jul 24, 2023

Description of the bug:

java_binarys with long classpaths are not properly terminated when the parent bash process receives a TERM or INT signal.

When bash receives a TERM signal, it does not forward that signal to the child process, and instead just exits, leaving the orphaned process behind.

When possible, it's best to exec the process, but since we want to clean up the temporary files created in create_and_run_classpath_jar, we can't exec $JAVABIN like we do in the short-classpath case. bazelbuild/rules_nodejs and aspect-build/rules_js handle this case in their launcher scripts (bazel-contrib/rules_nodejs#246, https://github.com/aspect-build/rules_js/blob/2dfdcbf08fda897a2280c686ed03e4a8b22e6286/js/private/js_binary.sh.tpl#L404-L423) by explicitly trapping the signals and forwarding them to the child process. I believe the Java launcher should do the same.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. bazel run //src/java_tools/buildjar/java/com/google/devtools/build/buildjar:BazelJavaBuilder -- --classpath_limit=0 --persistent_worker
  2. From another terminal pgrep -fa BazelJavaBuilder
  3. Notice there are 3 processes
    1. The bazel run ... process
    2. The bash .../BazelJavaBuilder ... process
    3. The java ... com.google.devtools.build.buildjar.BazelJavaBuilder ... process
  4. Run kill -TERM <PID of the bash process>
  5. pgrep -fa BazelJavaBuilder
  6. Notice the java process is still running even though its parents have exited

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

release 6.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

git@github.com:bazelbuild/bazel.git
f48c0f6504f29fd37ba966202b3c502d4b636d96
f48c0f6504f29fd37ba966202b3c502d4b636d96

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

I believe this bug has existed since create_and_run_classpath_jar was added in 102ce6d.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@JaredNeil
Copy link
Contributor Author

JaredNeil commented Jul 26, 2023

Turns out signal handling in Bash is way more complicated than I thought, and is proving to be quite tricky to get right. A couple things I've learned while trying to get this to work reliably:

  • Bash sets the SIGINT and SIGQUIT handlers on background processes to be ignored, which it doesn't do to foreground processes. This means you get different behavior when sending signals to processes that don't set their own handlers.
  • If bash is running in monitoring mode (login shell or set -m), then the above behavior does not apply.

I'm thinking the only way to get the expected behavior in all cases is going to be to exec $JAVABIN .... I can think of a handful of ways to deal with the manifest jar:

  1. Don't bother deleting it. It's in /tmp/ and probably a relatively small file (~10KB range).
  2. Same as 1, but use a deterministic path (shasum $MANIFEST_FILE) for the jar so you don't get a new one each time you run the same bazel target.
MANIFEST_JAR_FILE="/tmp/$(shasum $MANIFEST_FILE | cut --fields=1 --delimiter=' ').jar"
# Probably need to do an atomic move to the path to avoid races between multiple instances of the same binary, and may be addition security race conditions here.
$JARBIN cvfm "$MANIFEST_JAR_FILE" "$MANIFEST_FILE" ...
rm -f "$MANIFEST_FILE"
exec $JAVABIN -classpath "$MANIFEST_JAR_FILE" "${ARGS[@]}"
  1. Launch a background process to delete the file in a few seconds, then exec.
rm -f "$MANIFEST_FILE"
(
  sleep 3
  rm -f "$MANIFEST_JAR_FILE" # I don't think this would work on Windows if the process still has the jar file open
) &
exec $JAVABIN -classpath "$MANIFEST_JAR_FILE" "${ARGS[@]}"
  1. Launch a background process that waits for the parent to exit before deleting the the file, then exec
rm -f "$MANIFEST_FILE"
(
  while kill -0 $$ 2>/dev/null; do
    sleep 5
  done
  rm -f "$MANIFEST_JAR_FILE"
) &
exec $JAVABIN -classpath "$MANIFEST_JAR_FILE" "${ARGS[@]}"
  1. Generate the classpath jar as a Bazel action and have the launcher use that one instead of generating a new one each time. This would entail dropping support for the --classpath_limit argument or moving it to a build setting.
  2. Create and compile a main function that we actually run, and it deletes the classpath jar and then calls the real main (probably also wouldn't work on windows).

I feel like 5 is my favorite, but is probably quite a bit of work. Are the Java rules still native, or are those in Starlark now? rules_scala and rules_kotlin use versions of this launcher, so it would be nice if they could share the Starlark implementation if we decide to go down this route.

I feel like 4 is the next best option, but not sure if kill -0 $$ would work the same on Windows. Spinning off a polling daemon also feels a bit icky. In the situations where I would be running this launcher it's not really a problem, but not sure that's universally true.

Any guidance on which of these options would be best? Happy to implement any of these once a decision is made.

@cushon
Copy link
Contributor

cushon commented Jul 27, 2023

Having Bazel generation the file instead of generating it when the script is executed makes sense to me.

The classpath jar was a hack that was added when JDKs didn't support params files, and JDK > 8 supports passing flag files. So I'd consider looking at having it generate a text file with the classpath entries, instead of a classpath jar. There's a FR for that here (it's about using an @argument file, not about whether the file is generated by Bazel or the stub script): #6354

Is supporting JDK 8 or earlier important to you?

@JaredNeil
Copy link
Contributor Author

Is supporting JDK 8 or earlier important to you?

No, we're on 11. Is Bazel ready to drop support for Java 8 entirely, or are you proposing only fixing the bug for feature_version > 8 and leaving the old classpath jar flow for older versions?

@cushon
Copy link
Contributor

cushon commented Sep 12, 2023

Is Bazel ready to drop support for Java 8 entirely

I'm not sure that it is. @comius @hvadehra have you given any thought to the timeline for Java 8 support? Vendors are going to be provided Java 8 support through at least 2030, so we're probably stuck with it for a while.

or are you proposing only fixing the bug for feature_version > 8 and leaving the old classpath jar flow for older versions?

I was wondering about it. I'm OK with landing improvements that only benefit Java > 8, as long as they aren't regressing existing Java 8 support.

@hvadehra
Copy link
Member

I and @meteorcloudy had a chat some time ago about this. I agree we probably need to support using bazel with java 8 for >5 more years.

I'm OK with landing improvements that only benefit Java > 8, as long as they aren't regressing existing Java 8 support.

+1

@hvadehra hvadehra added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants