-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make py_binary have zipped binary as implicit output #3530
Comments
An alternative would be to add an explicit attribute to |
The implicit output seems a lot more predictable: if you want the zip, ask for the zip. It also allows for faster iteration as As a final point, if anyone ever lands PEX support (#2038) then that could also be implemented as an implicit output, rather than Yet Another Parameter. |
In considering the native rules' zip files vs subpar, I'm thinking we should aim to implement this and deprecate subpar. See google/subpar#44 for more discussion. It may also be worth renaming the |
Sounds good. But I guess our |
My understanding is that Python itself shouldn't care about the file extension, but using a different extension may affect file associations in the OS shell. While not all versions of Python would recognize Pep 441 explicitly mentions that the zip format may be used with shebang lines and self-extracting archives, so I don't see a conflict. |
Is there any update or movement here? It would be great to be able to reference: and have the executable zip consumed in other targets, such as rules_docker. |
I think it's preferred to not have implicit output. How about we add it to a output group called
|
This would be wonderful @meteorcloudy I'd be keen to try help implement this as well if necessary. Or is this something that you are already wishing to take on? I imagine this would need to be added to rules_python and not Bazel itself, however, I have no current knowledge of what |
Thanks @meteorcloudy for the implementation! |
@meteorcloudy , could you give more context why implicit output was not preferred? I really miss the behaviour provided by implicit output. |
Implicit output requires the rule to always generate an action that outputs the declared file. But in some cases, that's not very nice. For example, when --build_python_zip is disabled, we'll have to generate a fake action for that. With output_group, we don't have that problem, if the flag is disabled, the output group will just be empty without throwing an error. |
I see. Thank you @meteorcloudy. It sounds like it's not really that implicit output is bad but the flag controlled zip output plus implicit output it tricky. Why don't we just generate the implicit output unconditionally, probably under a different name like "name.pz"? |
Because the python zip file includes all transitive runtime dependencies, sometimes it could take a long time to build (especially on Windows). I think there are some other disadvantages of implicitly outputs in general, @oquenchil anything you could add? |
So it might be that implicit outputs stay in Starlark rules in some form. Not certain about this. The problem here is what Yun said, it is expensive to create the action for that artifact unconditionally, so unfortunately you will have to use the additional flag for distribution. |
An implicit output is really handy when you're working on the command line and want to both run locally (regular target), and then build the zip for deployment elsewhere (the implicit target) Working with an output group is clumsier. |
You can just define an extra target in the BUILD file:
Then in the command line you only need to switch between |
Indeed, but that is more verbose and clumsier than an implicit target, and also requires the author of the build files to know in advance whether or not all consumers of that file will want a zip or not (it's arguable that they should know, but in a large codebase that may not actually be true) |
I miss the experince of building |
…unnable python application. The Problem: We store two copies of evey python file and c-extension in `parfile`s/the `unpar` directory. --- The Current State of Affairs: Take the following example `BUILD` file, ```py // BUILD.bazel pkg_python_app( name = "my_app", srcs = ["my_app.py"], bindir = "/usr/bin", libdir = "/lib", tar = "staging", ) ``` This created a `staging.tar` which when extracted created the following files, ``` /usr/bin/my_app # adds environment variables and calls '/lib/par/my_app.par' /lib /par/my_app.par # a parfile (zip,) containing all dependencies /unpar/my_app/ # the extracted version of 'my_app.par' __main__.py PAR_MANFIEST pypi__36__attrs_20_3_0/... com_128technology_i95/... ``` `/lib/unpar/my_app` is created the first time the application is run, or could be created eagerly by runing the executable with the environment variable `PAR_EXTRACT_ONLY=1`. The `parfile` had no compression applied to it, so that means **we keep two copies of every single python file and c-extension**. --- The History: zipfiles created by bazel (via the commandline flag `--build_python_zip`) cannot be run outside of the build/ directory. We chose `par_binary` (`google/subpar`) because it creates `parfile`s which _can_ be copied anywhere and executed (the python interpreter is not bundled in the parfile.) parfiles work best with `zip_safe = True`. That allows the python interpreter to find the python source files directly within the parfile without having to extract the archive first. c-extensions are **not** `zip_safe` so any `parfile` that includes one must be marked as `zip_safe = False`. Most of our python applications include a c-extension as a dependency (`lxml`, `pycrypto`, etc...) so _most_ are marked as `zip_safe = False`. `google/subpar` extracts to a **unique** tempdir when `zip_safe = False`. This lead to extremely large overhead for python "scripts" that were supposed to be invoked, do one thing, and exit. For example, we were running thousands of `salt` commands and each one was taking 5+ seconds to extract _before_ the beginning of `main()` was even called. We forked and created `128technology/subpar` which added a new `extract_dir` argument. This allowed the `parfile` to extract to a Known Good Place and re-use the extracted version of the application as long as the parfile didn't change. The `parfile` used a hash of all the files in the archive to create the `PAR_MANFIEST` so as to not re-extract unless that file in the extracted dir didn't match the one embeded within the `parfile`. The `parfile` still contained the `__main__.py` so the `exec_wrapper` called the `parfile` which used all the files in `unpar/` _except_ the `__main__.py`. We then continued to add more patches on top of `subpar` to byte-compile eagerly, etc... --- The Solution: `subpar` has outlived it's usefulness. We've hacked it to bits so it's time to re-write the parts we use from the ground(ish) up. Meet `py_unzip`. Take the same `BUILD` file from before and add `use_py_unzip = True` ```py // BUILD.bazel pkg_python_app( name = "my_app", srcs = ["my_app.py"], bindir = "/usr/bin", libdir = "/lib", use_py_unzip = True, # <--- new! ) ``` Now `my_app.tar` is created which extracts to, ``` /usr/bin/my_app # adds environment variables and calls '/lib/unzip/my_app/__main__.py' /lib/unzip/my_app/ __main__.py runfiles/ pypi__36__attrs_20_3_0/... com_128technology_i95/... ``` There is no duplication between the archive/unarchived version of the application. --- The Implementation: - Take the zipfile that bazel generates via `py_binary` (same as `--build_python_zip`.) This is done by `ctx.attr.src[OutputGroupInfo].python_zip_file` which is the same as ``` filegroup( name = "foo_zip", srcs= ["//bar:foo"], output_group = "python_zip_file", ) ``` as mentioned in this github comment, bazelbuild/bazel#3530 (comment). - Replace the `__main__.py` with one that works in the extracted directory. Bazel's main comes from `tools/python/python_bootstrap_template.txt` which templates in a half-dozen variables; our new `__main__.py` is based on that, but simplified to only work for our use-case (extracted outside of bazel.)
I'd like to be able to depend on the zipped python binary for distribution, but not build the zipped binary usually. To do this, I'd like to specify
//path/to/my/py_binary_target.zip
as a dependency of my distribution artifact.At the moment, it's possible to build with
--build_python_zip
, but I'd rather not have to use different flags for ordinary builds and distribution builds.Environment info
Operating System:
Bazel version (output of
bazel info release
):release 0.5.3
Have you found anything relevant by searching the web?
Found some info in https://bazel.build/designs/2016/09/05/build-python-on-windows.html
The text was updated successfully, but these errors were encountered: