Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make py_binary have zipped binary as implicit output #3530

Closed
kamalmarhubi opened this issue Aug 9, 2017 · 19 comments
Closed

Make py_binary have zipped binary as implicit output #3530

kamalmarhubi opened this issue Aug 9, 2017 · 19 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python type: feature request

Comments

@kamalmarhubi
Copy link
Contributor

kamalmarhubi commented Aug 9, 2017

I'd like to be able to depend on the zipped python binary for distribution, but not build the zipped binary usually. To do this, I'd like to specify //path/to/my/py_binary_target.zip as a dependency of my distribution artifact.

At the moment, it's possible to build with --build_python_zip, but I'd rather not have to use different flags for ordinary builds and distribution builds.

Environment info

  • Operating System:

      $ lsb_release --description 
      Description:    Debian GNU/Linux testing (buster)
    
  • Bazel version (output of bazel info release): release 0.5.3

Have you found anything relevant by searching the web?

Found some info in https://bazel.build/designs/2016/09/05/build-python-on-windows.html

@meteorcloudy meteorcloudy added category: rules > python P2 We'll consider working on this in future. (Assignee optional) type: feature request labels Aug 10, 2017
@meteorcloudy meteorcloudy self-assigned this Aug 10, 2017
@duggelz
Copy link

duggelz commented Nov 17, 2017

An alternative would be to add an explicit attribute to py_binary/py_test rules that changes the behavior for that single target, from the current behavior (generated "py_binary_target" is a shell script that invokes the "py_binary_target.runfiles/" tree on Linux), to the "--build_python_zip" behavior ("py_binary_target" is a zip file with '#!" line).

@brandjon brandjon added P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python and removed P2 We'll consider working on this in future. (Assignee optional) category: rules > python labels Oct 19, 2018
@shs96c
Copy link
Contributor

shs96c commented Mar 26, 2019

The implicit output seems a lot more predictable: if you want the zip, ask for the zip. It also allows for faster iteration as bazel run won't require the building of a zip.

As a final point, if anyone ever lands PEX support (#2038) then that could also be implemented as an implicit output, rather than Yet Another Parameter.

@brandjon
Copy link
Member

In considering the native rules' zip files vs subpar, I'm thinking we should aim to implement this and deprecate subpar. See google/subpar#44 for more discussion.

It may also be worth renaming the .zip extension to .pyz after PEP 441.

@meteorcloudy
Copy link
Member

Sounds good. But I guess our .zip file doesn't work exactly the way .pyz file work?

@brandjon
Copy link
Member

My understanding is that Python itself shouldn't care about the file extension, but using a different extension may affect file associations in the OS shell. While not all versions of Python would recognize .pyz under windows, you'd think it'd be no worse than leaving it as .zip where the default association would be some extractor tool.

Pep 441 explicitly mentions that the zip format may be used with shebang lines and self-extracting archives, so I don't see a conflict.

@groodt
Copy link
Contributor

groodt commented Sep 27, 2019

Is there any update or movement here?

It would be great to be able to reference:
//path/to/my/py_binary_target.zip

and have the executable zip consumed in other targets, such as rules_docker.

@meteorcloudy
Copy link
Member

I think it's preferred to not have implicit output. How about we add it to a output group called python_zip_file, then you can access it by

filegroup(
  name = "foo_zip",
  srcs=  ["//bar:foo"],
  output_group = "python_zip_file",
)

@groodt
Copy link
Contributor

groodt commented Oct 1, 2019

This would be wonderful @meteorcloudy

I'd be keen to try help implement this as well if necessary. Or is this something that you are already wishing to take on? I imagine this would need to be added to rules_python and not Bazel itself, however, I have no current knowledge of what --build_python_zip is doing behind the covers in the Bazel code.

@meteorcloudy
Copy link
Member

@groodt I think this should be implemented in the native python rule, because that's what --build_python_zip applies to.
I sent #9453, please take a look.

@brandjon
Copy link
Member

Thanks @meteorcloudy for the implementation!

@mzhaom
Copy link
Contributor

mzhaom commented Oct 25, 2021

I think it's preferred to not have implicit output. How about we add it to a output group called python_zip_file, then you can access it by

@meteorcloudy , could you give more context why implicit output was not preferred? I really miss the behaviour provided by implicit output.

@meteorcloudy
Copy link
Member

Implicit output requires the rule to always generate an action that outputs the declared file. But in some cases, that's not very nice. For example, when --build_python_zip is disabled, we'll have to generate a fake action for that. With output_group, we don't have that problem, if the flag is disabled, the output group will just be empty without throwing an error.

@mzhaom
Copy link
Contributor

mzhaom commented Oct 26, 2021

I see. Thank you @meteorcloudy. It sounds like it's not really that implicit output is bad but the flag controlled zip output plus implicit output it tricky. Why don't we just generate the implicit output unconditionally, probably under a different name like "name.pz"?

@meteorcloudy
Copy link
Member

Why don't we just generate the implicit output unconditionally

Because the python zip file includes all transitive runtime dependencies, sometimes it could take a long time to build (especially on Windows). I think there are some other disadvantages of implicitly outputs in general, @oquenchil anything you could add?

@oquenchil
Copy link
Contributor

So it might be that implicit outputs stay in Starlark rules in some form. Not certain about this. The problem here is what Yun said, it is expensive to create the action for that artifact unconditionally, so unfortunately you will have to use the additional flag for distribution.

@shs96c
Copy link
Contributor

shs96c commented Oct 27, 2021

An implicit output is really handy when you're working on the command line and want to both run locally (regular target), and then build the zip for deployment elsewhere (the implicit target) Working with an output group is clumsier.

@meteorcloudy
Copy link
Member

You can just define an extra target in the BUILD file:

filegroup(
  name = "foo_zip",
  srcs=  ["//bar:foo"],
  output_group = "python_zip_file",
)

Then in the command line you only need to switch between bazel build //:foo and bazel build //:foo_zip, I don't think that's much harder than implicit outputs?

@shs96c
Copy link
Contributor

shs96c commented Oct 27, 2021

Indeed, but that is more verbose and clumsier than an implicit target, and also requires the author of the build files to know in advance whether or not all consumers of that file will want a zip or not (it's arguable that they should know, but in a large codebase that may not actually be true)

@mzhaom
Copy link
Contributor

mzhaom commented Oct 27, 2021

I miss the experince of building par with blaze, nothing needs to be added in BUILD files to switch between build :foo vs build :foo.par

jacobbogdanov added a commit to 128technology/rules_128tech that referenced this issue Aug 8, 2023
…unnable python application.

The Problem:

We store two copies of evey python file and c-extension in `parfile`s/the `unpar` directory.

---

The Current State of Affairs:

Take the following example `BUILD` file,
```py
// BUILD.bazel
pkg_python_app(
  name = "my_app",
  srcs = ["my_app.py"],
  bindir = "/usr/bin",
  libdir = "/lib",
  tar = "staging",
)
```
This created a `staging.tar` which when extracted created the following files,

```
/usr/bin/my_app  # adds environment variables and calls '/lib/par/my_app.par'
/lib
  /par/my_app.par  # a parfile (zip,) containing all dependencies
  /unpar/my_app/ # the extracted version of 'my_app.par'
    __main__.py
    PAR_MANFIEST
    pypi__36__attrs_20_3_0/...
    com_128technology_i95/...
```

`/lib/unpar/my_app` is created the first time the application is run, or could be created eagerly by runing the executable with the environment variable `PAR_EXTRACT_ONLY=1`.

The `parfile` had no compression applied to it, so that means **we keep two copies of every single python file and c-extension**.

---

The History:

zipfiles created by bazel (via the commandline flag `--build_python_zip`) cannot be run outside of the build/ directory.
We chose `par_binary` (`google/subpar`) because it creates `parfile`s which _can_ be copied anywhere and executed (the python interpreter is not bundled in the parfile.)

parfiles work best with `zip_safe = True`.
That allows the python interpreter to find the python source files directly within the parfile without having to extract the archive first.
c-extensions are **not** `zip_safe` so any `parfile` that includes one must be marked as `zip_safe = False`.
Most of our python applications include a c-extension as a dependency (`lxml`, `pycrypto`, etc...) so _most_ are marked as `zip_safe = False`.

`google/subpar` extracts to a **unique** tempdir when `zip_safe = False`.
This lead to extremely large overhead for python "scripts" that were supposed to be invoked, do one thing, and exit.
For example, we were running thousands of `salt` commands and each one was taking 5+ seconds to extract _before_ the beginning of `main()` was even called.

We forked and created `128technology/subpar` which added a new `extract_dir` argument.
This allowed the `parfile` to extract to a Known Good Place and re-use the extracted version of the application as long as the parfile didn't change.
The `parfile` used a hash of all the files in the archive to create the `PAR_MANFIEST` so as to not re-extract unless that file in the extracted dir didn't match the one embeded within the `parfile`.
The `parfile` still contained the `__main__.py` so the `exec_wrapper` called the `parfile` which used all the files in `unpar/` _except_ the `__main__.py`.

We then continued to add more patches on top of `subpar` to byte-compile eagerly, etc...

---

The Solution:

`subpar` has outlived it's usefulness. We've hacked it to bits so it's time to re-write the parts we use from the ground(ish) up.

Meet `py_unzip`.

Take the same `BUILD` file from before and add `use_py_unzip = True`
```py
// BUILD.bazel
pkg_python_app(
  name = "my_app",
  srcs = ["my_app.py"],
  bindir = "/usr/bin",
  libdir = "/lib",
  use_py_unzip = True,  # <--- new!
)
```

Now `my_app.tar` is created which extracts to,
```
/usr/bin/my_app  # adds environment variables and calls '/lib/unzip/my_app/__main__.py'
/lib/unzip/my_app/
  __main__.py
  runfiles/
    pypi__36__attrs_20_3_0/...
    com_128technology_i95/...
```
There is no duplication between the archive/unarchived version of the application.

---

The Implementation:

- Take the zipfile that bazel generates via `py_binary` (same as `--build_python_zip`.)
  This is done by `ctx.attr.src[OutputGroupInfo].python_zip_file` which is the same as
```
filegroup(
  name = "foo_zip",
  srcs=  ["//bar:foo"],
  output_group = "python_zip_file",
)
```
  as mentioned in this github comment, bazelbuild/bazel#3530 (comment).

- Replace the `__main__.py` with one that works in the extracted directory.
  Bazel's main comes from `tools/python/python_bootstrap_template.txt` which templates in a half-dozen variables; our new `__main__.py` is based on that, but simplified to only work for our use-case (extracted outside of bazel.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants