Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neither python_top nor python toolchain works with starlark actions on windows #7947

Closed
pziggo opened this issue Apr 4, 2019 · 23 comments
Closed
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Python Native rules for Python type: bug

Comments

@pziggo
Copy link

pziggo commented Apr 4, 2019

Description of the problem / feature request:

Trying to use a py_binary() target as executable with actions.run() fails on windows if py_runtime() is used. Error Message:

LAUNCHER ERROR: Cannot launch process: "python.exe" ${PATH}\my_py_binary.zip ${ARGS}
Reason: (error: 2): The system cannot find the file specified.

This issue happens with python_top and the new python toolchain feature. It works fine on Linux though.

Feature requests: what underlying problem are you trying to solve with this feature?

Executing python actions during the build with a hermetic python environment.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

see here

What operating system are you running Bazel on?

Windows 10

What's the output of bazel info release?

development version

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

bazel build //src:bazel_jdk_minimal on Windows 10

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

https://github.com/bazelbuild/bazel.git
78d6831
78d6831

Have you found anything relevant by searching the web?

Discussion has been started on Slack: https://bazelbuild.slack.com/archives/CA306CEV6/p1554299740008400

Any other information, logs, or outputs that you want to share?

From the resulting __main__.py

PYTHON_BINARY = 'python3_windows/python.exe'
if IsWindows() and not HasWindowsExecutableExtension(PYTHON_BINARY):
 PYTHON_BINARY = PYTHON_BINARY + '.exe'

from the rule definition:

"_generator": attr.label(
    cfg = "host",
    executable = True,
    allow_files = False,
    default = Label("//bazel/tools/generator:generator"),
),

and from the rule implementation

ctx.actions.run(
    inputs = [],
    outputs = [out_file],
    executable = ctx.executable._generator,
    arguments = [args],
)
@pziggo
Copy link
Author

pziggo commented Apr 4, 2019

cc @brandjon

@brandjon brandjon added team-Rules-Python Native rules for Python untriaged labels Apr 4, 2019
@brandjon
Copy link
Member

brandjon commented Apr 5, 2019

Ok, I played around with our windows sandbox vm and think I understand. The most likely situation is that your py_runtime's interpreter_path attribute doesn't correctly identify a Python.exe binary. Please double check the path, and if you don't think that's the problem, please post the py_runtime definition here.

From my experimentation, it doesn't look like the PATH environment variable affects whether or not the absolute path is able to resolve successfully at execution time (i.e. we're not sandboxing anything in Windows in a way that breaks this).

The error message is a little misleading: Our Python launcher will first check whether the given Python interpreter file exists, and if not, fall back on "python.exe" on its own. This means that the error message you see will always say it can't find "python.exe" even if you specified interpreter_path = r"C:\my_custom_interpreter.exe" in your py_runtime.

(Note the "r" raw string literal prefix in that last sentence. Depending on your path to the Python interpreter, it's possible you're using a single backslash before something that's interpreted as a valid escape sequence, producing a different string than you think. When in doubt, it's valid to replace backslashes with slashes in windows paths.)

Please let me know if this helps. If nothing else, we'll keep this issue open to track the bad error message in the Python launcher (or the removal of that fallback entirely).

@brandjon brandjon added P2 We'll consider working on this in future. (Assignee optional) type: bug type: support / not a bug (process) and removed untriaged type: bug labels Apr 5, 2019
@pziggo
Copy link
Author

pziggo commented Apr 7, 2019

Actually we are using an in-build runtime from an external dependency and finally I managed to setup a simple example here pziggo/bazel_py_runtime_win_example.

I also debugged the issue a bit more and I think I'm getting closer to understand the root cause. Thanks for the link which was a good starting point.

The python launcher checks if the given python interpreter exists by converting the PYTHON_BIN_PATH into an absolute path on Windows. In fact by using the in-build runtime from an external dependency, the constructed path really doesn't exists:

LAUNCHER ERROR: Pyhon binary: python3_windows/python.exe
LAUNCHER ERROR: Absolute python binary path: \\?\c:\users\me\_bazel_me\rsotqivv\execroot\bazel_py_runtime_win_example\python3_windows\python.exe

while the following path exists (note the "external" in the path):

C:\Users\me\_bazel_me\rsotqivv\execroot\bazel_py_runtime_win_example\external\python3_windows

So, before handing over the python_bin_path to the python launcher here, the path will be modified and the external part stripped for the runfiles tree. But in fact, on Windows the runfiles tree lives inside the resulting zip file and thus is not a valid path for running the packaged zip if I'm not mistaken.
That would mean, for createWindowsExeLauncher() the value of pythonBinary must be different then for the functions that build the content of the zip file / using the runfiles.

@brandjon
Copy link
Member

brandjon commented Apr 8, 2019

Thank you for the investigation and repro! I haven't run your repro yet, but if I understand correctly it sounds like the scope of this problem is anytime the py_runtime references an in-build interpreter from an external repo on Windows.

@pziggo
Copy link
Author

pziggo commented Apr 13, 2019

Yes, sounds about right.

@pziggo
Copy link
Author

pziggo commented Apr 13, 2019

But I wonder what would be a good change to solve the problem. Fixing the path could be relatively easy, but that would not work on remote execution if I’m not mistaken.
So, the interpreter should become a direct runtime dependency and available in the runfiles tree. But I think having the interpreter twice (in the runfiles and inside the zip) is not optimal either for the overall performance.

@brandjon
Copy link
Member

brandjon commented May 10, 2019

+@laszlocsomor FYI.

So to summarize: When an in-build interpreter is used on Windows, the Python launcher will take its workspace-relative path and use that in combination with its own working directory to construct the full (non-runfile) path to the interpreter. Due to a bug, this fails when the interpreter is defined in an external repo.

I'm not sure what the plan is for remote execution and windows, but there are likely already other blockers within the Python rules. For instance, BazelPythonSemantics checks OS.getCurrent to decide whether to use a launcher, but that's not needed if the target platform isn't windows.

Aside from that, this behavior is another difference between unix and Windows. On unix, the shebang of the stub script is #!/usr/bin/env python regardless of what interpreter will later be used to execute the payload. But on Windows, this launcher behavior means that the stub script (in the zip file's __main__.py) will use whatever the payload's interpreter is, instead of always using a standard system interpreter to bootstrap. I don't see an obvious flaw in doing it that way, but it is a conceptual difference.

You can see an analogous version of this bug in google/subpar#98, where in-build interpreters caused the shebang to get a relative path.

@brandjon
Copy link
Member

Regarding priority: This issue prevents using many kinds of in-workspace Python runtimes on Windows, including interpreters provided by external repos (e.g. @bazel_tools) and interpreters that are generated artifacts as opposed to checked-in binaries. In particular, it blocks the addition of the autodetecting toolchain on Windows (#7844). However, it does not block the roll out of Python toolchains (#7899), because the default toolchain on Windows is a hack that preserves legacy behavior.

Therefore, a fix for this won't be in 0.27, but it'd be great to have it for 0.28.


Now on to the actual design issue. As I alluded to above, on unix we use a two-phase approach to launching Python programs.

  1. A system interpreter is used to run the stub script or zip file, which initializes env vars, locates/extracts runfiles, etc. This interpreter is discovered by the shebang #!/usr/bin/env python.

  2. A separate interpreter is called by the stub logic to run the payload user code. This may be either a system interpreter or an in-workspace (i.e. runfiles) interpreter.

This separation into two separate phases is what enables us to use an in-workspace interpreter packaged within a zip file. The zip depends on a system interpreter to bootstrap and extract itself.

On Windows, this separation does not currently exist: Regardless of whether you're using a zip file or not, the Python launcher will try to run the stub script with the same interpreter that your toolchain requests to be used for payload user code. This can be an interpreter in runfiles, which haven't been extracted yet.

The way we currently get around this is by giving the launcher a path to the interpreter binary on disk, i.e. the actual checked-in source file or the built binary file in the output tree. This is pretty brittle. First, there's the breakage described above, when the path construction doesn't quite work out. But even if we fix that by passing an absolute path to the on-disk file, we'd end up in a situation where the resulting py_binary (launcher) executable breaks if you move your source files around or run bazel clean, even if you've already copied the exe and zip files elsewhere on your system.

So let's revisit our strategy: There's no reason we need to use the same interpreter for the zip file / stub script that we do for the payload code. We can assume the presence of a system Python interpreter for the purposes of bootstrapping. The launcher can hardcode that assumption, which could take different forms, for example:

  • python.exe is on PATH
  • py.exe is on PATH
  • I can locate a Python interpreter by doing some registry stuff

It's even possible to let the Python toolchain customize how the bootstrapping interpreter is found, just like you could let it parameterize the shebang string. But I don't see much value in this at the moment.

So right now, I think the goal should be to add python.exe / py.exe detection to the launcher, and no longer pass an interpreter to the launcher. Note that we write the stub script to target a lowest common denominator of reasonable Python interpreters, so it shouldn't matter what version interpreter the launcher comes up with.

@laszlocsomor
Copy link
Contributor

Thanks for the summary!

on unix we use a two-phase approach to launching Python programs.

IIUC we do so because the stub script itself is in Python, so the binary requires a system interpreter even in presence of a bundled interpreter. But isn't part of a bundled interpreter's goal to enable running without a system interpreter?

The analogous problem is running Bazel itself. The embedded JVM lets you run Bazel without a system JVM. Its launcher (the bazel client) is a native binary.

On Windows, this separation does not currently exist: Regardless of whether you're using a zip file or not, the Python launcher will try to run the stub script with the same interpreter that your toolchain requests to be used for payload user code. This can be an interpreter in runfiles, which haven't been extracted yet.

Sure, the launcher should begin with extracting the payload. We can fix that.

We can assume the presence of a system Python interpreter for the purposes of bootstrapping.

Doesn't that partially beat the purpose of a bundled interpreter? If the stub weren't a python script but were part of the .exe launcher, we wouldn't need this assumption.

My questions:

  • what do you all think about my assessment that the bundled interpreter should enable the binary to run without a system interpreter?
  • what does the stub script do? could the native launcher do that instead?
  • the native launcher's python_binary field could either contain a runfile path (e.g. runfile:python3_windows/python.exe, meaning either a runfile path to be retrieved with rlocation or a path in the zip), or a basename (e.g. python.exe, meaning the current interpreter on the PATH, whatever it may be) -- WDYT?

@meteorcloudy
Copy link
Member

meteorcloudy commented May 22, 2019

the native launcher's python_binary field could either contain a runfile path (e.g. runfile:python3_windows/python.exe, meaning either a runfile path to be retrieved with rlocation or a path in the zip)

This is true when using python toolchain or --python_top. That's why #8440 can fix this problem.

@meteorcloudy
Copy link
Member

With #8440, I can run all commands in https://github.com/pziggo/bazel_py_runtime_win_example, but got error like this

pcloudy@pcloudy0-w MSYS ~/workspace/bazel_py_runtime_win_example
$ bazel build --python_top=//tools/python:py3_windows_runtime //:test
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from c:\users\pcloudy\.bazelrc: --output_user_root=C:/src/tmp
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=320
INFO: Options provided by the client:
  'build' options: --python_path=C:/Python36/python.exe
INFO: Reading rc options for 'build' from c:\users\pcloudy\.bazelrc:
  'build' options: --curses=yes --color=yes --verbose_failures --announce_rc
INFO: Analyzed target //:test (14 packages loaded, 113 targets configured).
INFO: Found 1 target...
INFO: Deleting stale sandbox base C:/src/tmp/ewf7xl5y/sandbox
ERROR: C:/tools/msys64/home/pcloudy/workspace/bazel_py_runtime_win_example/BUILD:3:1: Generating file failed (Exit 1): generator.exe failed: error executing command
  cd C:/src/tmp/ewf7xl5y/execroot/bazel_py_runtime_win_example
bazel-out/host/bin/tools/generator/generator.exe bazel-out/x64_windows-fastbuild/bin/file
Execution platform: @bazel_tools//platforms:host_platform
Traceback (most recent call last):
  File "D:\obj\Windows-Release\37amd64_Release\msi_python\zip_amd64\runpy.py", line 193, in _run_module_as_main
  File "D:\obj\Windows-Release\37amd64_Release\msi_python\zip_amd64\runpy.py", line 85, in _run_code
  File "C:\src\tmp\ewf7xl5y\execroot\bazel_py_runtime_win_example\bazel-out\host\bin\tools\generator\generator.zip\__main__.py", line 303, in <module>
  File "C:\src\tmp\ewf7xl5y\execroot\bazel_py_runtime_win_example\bazel-out\host\bin\tools\generator\generator.zip\__main__.py", line 282, in Main
NameError: name 'exit' is not defined
C:\Users\pcloudy\AppData\Local\Temp\Bazel.runfiles_s6rwiedn\runfiles\python3_windows\python.exe
Target //:test failed to build
INFO: Elapsed time: 5.256s, Critical Path: 0.47s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

But it's caused by the python binary we are using, not a Bazel bug.

@brandjon
Copy link
Member

But isn't part of a bundled interpreter's goal to enable running without a system interpreter?

I don't think this is something we need to prioritize, but it'd be nice to have in the future. With a bundled interpreter that requires the system interpreter for bootstrapping, you still get hermeticity once you're past the stub script phase.

The analogous problem is running Bazel itself. The embedded JVM lets you run Bazel without a system JVM. Its launcher (the bazel client) is a native binary.

Then by that logic we could embed a native Python interpreter (perhaps miniaturized) into each py_binary artifact. But at that point perhaps it'd be worth rewriting the stub script in C++.

An alternative to embedding an extra interpreter into the artifact would be to have a way of extracting the payload user code's interpreter from runfiles without using Python... and again the way to do that would be essentially rewriting the stub in C++.

Sure, the launcher should begin with extracting the payload. We can fix that.

Sounds like we're arriving at similar conclusions.

what does the stub script do? could the native launcher do that instead?

The stub script:

  1. locates the runfiles directory, or locates and extracts the zip to create the runfiles directory
  2. initializes environment variables, including for the runfiles library, and PYTHONPATH
  3. locates the main file (entry point of payload user code) and second-stage Python interpreter
  4. invokes the interpreter on the main file and cleans up the extracted zip contents

the native launcher's python_binary field could either contain a runfile path (e.g. runfile:python3_windows/python.exe, meaning either a runfile path to be retrieved with rlocation or a path in the zip), or a basename (e.g. python.exe, meaning the current interpreter on the PATH, whatever it may be) -- WDYT?

In principle the stub script should be runnable under any reasonable Python interpreter (with some hand wavyness around "reasonable"). So there shouldn't be a need to customize which system interpreter is used, whether it's python2, python3, python.exe, py.exe (the Python wrapper on Windows), etc. I'd like to avoid adding the ability to customize that until it proves to be appropriate and useful. Same goes for the unix shebang, which is currently hardcoded as #!/usr/bin/env python.

However, if we're eliminating the dependency on a system interpreter and just using whatever interpreter is specified for the payload user code (whether system or runfiles based), then we'd just pass that along to the launcher, as we do today.

@brandjon
Copy link
Member

But it's caused by the python binary we are using, not a Bazel bug.

Apparently the exit() function doesn't exist if you invoke Python with -S. We should change our calls to exit() in the stub script to instead use sys.exit(). I'll make a quick fix for that.

@laszlocsomor
Copy link
Contributor

But isn't part of a bundled interpreter's goal to enable running without a system interpreter?

I don't think this is something we need to prioritize, but it'd be nice to have in the future. With a bundled interpreter that requires the system interpreter for bootstrapping, you still get hermeticity once you're past the stub script phase.

Thanks for confirming. I just wanted to note that requiring an ambient Python interpreter might cause frustration. People were unhappy about Bazel requiring MSYS2 too.

The analogous problem is running Bazel itself. The embedded JVM lets you run Bazel without a system JVM. Its launcher (the bazel client) is a native binary.

Then by that logic we could embed a native Python interpreter (perhaps miniaturized) into each py_binary artifact. But at that point perhaps it'd be worth rewriting the stub script in C++.

Sounds reasonable.

An alternative to embedding an extra interpreter into the artifact would be to have a way of extracting the payload user code's interpreter from runfiles without using Python... and again the way to do that would be essentially rewriting the stub in C++.

Agreed. The Bazel client does the same.

Sure, the launcher should begin with extracting the payload. We can fix that.

Sounds like we're arriving at similar conclusions.

Indeed. :)

what does the stub script do? could the native launcher do that instead?

The stub script:

  1. locates the runfiles directory, or locates and extracts the zip to create the runfiles directory
  2. initializes environment variables, including for the runfiles library, and PYTHONPATH
  3. locates the main file (entry point of payload user code) and second-stage Python interpreter
  4. invokes the interpreter on the main file and cleans up the extracted zip contents

Thanks. None of that sounds daunting. I implemented the test wrapper in C++. It solves similar problems, we could reuse it. (It's Windows-focused though.)

@brandjon
Copy link
Member

That all sounds good to me. I forked off #8446.

In the meantime, we should make the launcher better at detecting a system interpreter. We can track that in this bug.

@brandjon
Copy link
Member

Let's discuss the merits of the approach of #8440 here. That PR makes it so if a runfiles path is passed to the launcher, it uses rlocation to resolve that to its absolute path. Since when a zip file is used the runfiles are manifest-based rather than directory-based, this doesn't suffer the chicken-and-egg problem of accessing an interpreter in a zip.

It does suffer the problem where if the build is cleaned, the .exe artifact is broken, even if it is copied out of the output tree. Currently I believe the .exe only depends on having the .zip in the same directory. I'm ok with settling for this limitation for now as a fix of existing behavior. It'll be moot once the launcher extracts the zip.

bazel-io pushed a commit that referenced this issue May 23, 2019
`exit()` doesn't exist if python was invoked without the `site` module. See #7947 (comment) (#7947).

RELNOTES: None
PiperOrigin-RevId: 249636384
@brandjon
Copy link
Member

brandjon commented May 23, 2019

Although, one consequence of this design is that if you have no system Python interpreter, but your build has a checked-in interpreter, it'll work on your development machine since the actual not-in-runfiles interpreter file is available, and then fail on a user's machine. But we don't have a better option before #8446, unless we want to fail fast on the development machine.

@meteorcloudy
Copy link
Member

meteorcloudy commented May 23, 2019

Indeed #8440 is just for making existing behavior that works on Linux also works on Windows.

Beyond this, it's not a Windows only problem. Ideally we should move all logic in python_stub_template.txt to the python native launcher. I considered to do so when implementing the native launcher, but at that time we were still making many changes to python_stub_template.txt and native launcher is only for Windows (also true today).

@laszlocsomor
Copy link
Contributor

@brandjon , @meteorcloudy : does any of you plan to work on this?

@meteorcloudy
Copy link
Member

@brandjon has a change to enable more python toolchain integration tests on Windows, I will submit my fix for this issue afterwards.

Yet I have no plan to migrate the python_stub_template.txt to C++.

@laszlocsomor
Copy link
Contributor

Thanks. I consider migrating it.
Can anyone help me prioritize it?

@brandjon
Copy link
Member

My pending fix blocked on a failing test over the long weekend, looking at it again today.

@brandjon brandjon added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels May 28, 2019
@brandjon brandjon self-assigned this May 28, 2019
@brandjon
Copy link
Member

My fix finally went through, merging the fix to this issue now.

irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
`exit()` doesn't exist if python was invoked without the `site` module. See bazelbuild#7947 (comment) (bazelbuild#7947).

RELNOTES: None
PiperOrigin-RevId: 249636384
irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
Previously, we assume the python/bash binary path passed to the launcher is an absolute path specified by `--python_path` or `--shell_executable`, but this is not true when using python/shell toolchain. When using the python toolchain, the given python binary path will be a runfile path, we can use `Rlocation` to find the absolute path.

Fixes bazelbuild#7947

Closes bazelbuild#8440.

PiperOrigin-RevId: 250569171
irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
`exit()` doesn't exist if python was invoked without the `site` module. See bazelbuild#7947 (comment) (bazelbuild#7947).

RELNOTES: None
PiperOrigin-RevId: 249636384
irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
Previously, we assume the python/bash binary path passed to the launcher is an absolute path specified by `--python_path` or `--shell_executable`, but this is not true when using python/shell toolchain. When using the python toolchain, the given python binary path will be a runfile path, we can use `Rlocation` to find the absolute path.

Fixes bazelbuild#7947

Closes bazelbuild#8440.

PiperOrigin-RevId: 250569171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Python Native rules for Python type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants