Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classpath JAR is incorrect when running java_binary remotely on Windows #13484

Closed
ruiqimao opened this issue May 17, 2021 · 7 comments
Closed
Assignees
Labels
area-Windows Windows-specific issues and feature requests P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@ruiqimao
Copy link

Description of the problem / feature request:

When a java_binary that is used as the executable to a ctx.actions.run has a classpath that exceeds the classpath limit for the launcher (7000 by default), the action fails to run remotely, exiting with a ClassNotFoundException:

Error: Could not find or load main class pkg0000000000000.Main
Caused by: java.lang.ClassNotFoundException: pkg0000000000000.Main

This behavior happens only when:

  • The action is run remotely
  • The action is run on Windows
  • The classpath length exceeds classpath limit

Workarounds for this issue include:

  • Using the --singlejar flag and including _deploy.jar in the inputs
  • Increasing the classpath limit using --classpath_limit

Both of these workarounds involve preventing the launcher from creating a classpath jar.

Digging around with the Bazel source code uncovered the following findings:

  1. The Classpath: attribute in the classpath jar is empty when run remotely.
  2. This happens due to the classpath paths being relative (java_launcher.cc#L213).
  3. These paths are obtained from BinaryLauncherBase::Rlocation (java_launcher.cc#L331).
  4. According to the doc for Rlocation, it should always return an absolute path.

Using a custom build of Bazel, I can confirm that:

  • The classpath paths that are returned by Rlocation are relative when run remotely.
  • Converting these paths to absolute paths using blaze_util::AsShortWindowsPath fixes the problem.

This leads me to conclude that somewhere in the pipeline, an absolute path is missing.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

//test/BUILD

load("//test:test.bzl", "test_rule")

java_binary(
  name = "binary",
  main_class = "pkg00001.Main",
  runtime_deps = [
    "//test/pkg00000:lib",
    "//test/pkg00001:lib",
    "//test/pkg00002:lib",
    "//test/pkg00003:lib",
    "//test/pkg00004:lib",
    ...
  ],
)

test_rule(
  name = "test",
  output = "output",
)

//test/test.bzl

def _test_impl(ctx):
  ctx.actions.run(
    inputs = [],
    outputs = [ctx.outputs.output],
    mnemonic = "depstest",
    arguments = [ctx.outputs.output.path],
    executable = ctx.executable._binary,
  )

test_rule = rule(
  attrs = {
    "_binary": attr.label(
      executable = True,
      cfg = "exec",
      default = Label("//test:binary"),
      allow_files = True,
    ),
    "output": attr.output(),
  },
  implementation = _test_impl,
)

//test/pkgXXXXX/Main.java

package pkgXXXXX;

import java.io.File;
import java.io.IOException;

class Main {
  public static void main(String[] args) throws IOException {
    new File(args[0]).createNewFile();
  }
}

Create multiple copies, replacing XXXXX with a unique number.

//test/pkgXXXXX/BUILD

java_library(
  name = "lib",
  srcs = ["Main.java"],
  visibility = ["//visibility:public"],
)

Create multiple copies, replacing XXXXX with a unique number.

Command

bazel build //test:output --config=remote

This setup can be generated using generate.py

What operating system are you running Bazel on?

Windows Server 2019 Datacenter, Version 1809

What's the output of bazel info release?

release 4.1.0rc4
@aiuto aiuto added area-Windows Windows-specific issues and feature requests team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged labels May 17, 2021
@cushon
Copy link
Contributor

cushon commented May 21, 2021

Which JDK version are you using? I'm curious because there's a more principled alternative to the 'classpath jar' workaround in the stub script, but it's only going to work with JDK 9: #6354

@ruiqimao
Copy link
Author

We are using our own prebuilt version of JDK11 for --javabase and --host_javabase.

@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) type: bug and removed untriaged labels Jun 3, 2021
@coeuvre coeuvre self-assigned this Jun 3, 2021
@coeuvre
Copy link
Member

coeuvre commented Jun 3, 2021

/cc @meteorcloudy

@meteorcloudy meteorcloudy self-assigned this Jun 7, 2021
@meteorcloudy
Copy link
Member

@ruiqimao Thanks for reporting this issue and the reproduce case! I did a little debugging, it's indeed a problem of the Rlocation function.

This is what's happening:

  • In the RBE run, the manifest file isn't copied to the remote machine and neither did RUNFILES_MANIFEST_FILEis set.
  • When the launcher cannot find the runfiles manifest file to do the mapping, the Rlocation function just returns <runfiles dir>/foo/bar/file.
  • And the runfiles dir is derived from arg0, so it might be a relative path, which resulted relative class path for the jars.

I'll try if I can make sure the runfiles dir is always calculated as an absolute path, so it won't cause this problem.

@meteorcloudy
Copy link
Member

@ruiqimao Can you please check if #13559 fixes this problem?

@ruiqimao
Copy link
Author

ruiqimao commented Jun 9, 2021

@meteorcloudy #13559 does appear to fix the issue!

@meteorcloudy
Copy link
Member

Cool, then I'll merge that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants