Hermetic toolchains incur a significant overhead to running py_binary and py_test #1653

martis42 · 2023-12-21T12:36:52Z

🐞 bug report

Affected Rule

py_binary and py_test

Is this a regression?

No

Description

Hermetic toolchains incur a significant overhead to executing Python targets due to the large amount of files which are added to the runfiles tree.
You find an analysis of the issue here: https://github.com/martis42/show_hermetic_python_overhead

Bazel caching of course negates much of this issue in daily working.
Still, whenever a test has to rerun we pay this additional overhead.

I don't know if there is a "perfect" solution possible given how Bazel sandboxing works.
However, it would be great if rues_python would add a note in its documentation for hermetic toolchains that adding --nolegacy_external_runfiles to the .bazelrc file is recommended.

Maybe the rules_python maintainer can even convince the Bazel maintainer to prioritize working on flipping --legacy_external_runfiles in Bazel 8 ?

🔬 Minimal Reproduction

You see reproduce this with this small example workspace https://github.com/martis42/show_hermetic_python_overhead

🔥 Exception or Error

NA

🌍 Your Environment

Operating System:

  
Linux Mint 21.1

Output of bazel version:

  
Bazelisk version: v1.16.0
Build label: 7.0.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Dec 11 16:51:49 2023 (1702313509)
Build timestamp: 1702313509
Build timestamp as int: 1702313509

Rules_python version:

  
0.27.0

Anything else relevant?

#1624 might be related.
It is however maybe Windows specific.

The text was updated successfully, but these errors were encountered:

rickeylev · 2023-12-22T06:44:27Z

Yeah, the caching can only do so well. The stdlib is a decent size. It's just a lot of files that have to symlinked.

I wonder if we'd be able to zip up the stdlib. I'm think that can Just Work.

Otherwise, maybe declare_symlink or symlink(target_path=...) (which are now out of experimental in bazel 7, I think) could be used to add one file to the runtime's set of files, and it just points back to the location of the runtime. I haven't had a change to try those out, though, so not sure about any downsides they have.

I really wish we had directories as a first-class concept so Bazel knew to materialize a single symlink, but to monitor the contents of it recursively.

rickeylev · 2024-01-12T05:08:42Z

A few of us were experimenting with symlinks and declare_directory (TreeArtifacts). The result is not really promising.

For local strategy actions, it works very well: TreeArtifacts create a single symlink to the directory; very cheap! Unfortunately, sandboxed actions don't do that; they copy the whole directory and/or symlink each individual file. We tried to trick bazel into create a symlink, but no luck.

The declare_symlink option has similar problems. It can create an arbitrary symlink, but a rule can't really know the correct absolute path to the directory.

Some maintainers were talking, and zipping the stdlib is probably the best option available. That would greatly reduce the number of files. A few companies already do this. A PR implementing this would be welcome.

Digging around the files in the python install, I also think that maybe 25-50% of them aren't necessary for running a program. e.g., all the vendored pip dependencies and interpreter headers don't need to be in the runfiles to run a binary.

On the Bazel side, there is bazelbuild/bazel#8230 which is approximately about the cost of creating large sets of files and is one of the better issues demonstrating that symlinking to a directory would be a big performance benefit; this issue also affects js/node

github-actions · 2024-07-10T22:50:36Z

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

martis42 · 2024-07-11T04:06:59Z

This is still relevant

rickeylev · 2024-07-16T23:14:18Z

Question: is using a locally installed platform runtime, instead of a remote downloaded in-build runtime, an appealing option to this problem?

I ask because I've merged some of the basic for using a locally installed runtime (one equivalent to what our hermetic runtimes do) in PR #2000. But, we're trying to figure out some of the use cases for it to better understand what public APIs it needs.

martis42 · 2024-07-17T04:57:07Z

... an appealing option to this problem?

No it is not. Technically it would solve the problem, at least on Linux (I am not working with other platforms). However, besides managing a polyglot workspace Bazel's appeal and big selling point are in my point of view hermeticity and thus reproducibility of actions, which enables the awesome caching performance. A host Interpreter toolchain is to my understanding defeating hermeticity.

That much said, I am still very much interested in rules_python offering better support for host Interpreter toolchains. There are some known quirks with the hermetic toolchain. Switching to the host Interpreter is an easy workaround for those (e. g. during debugging). While it is easy enough to set the host interpreter itself, what is missing are the toolchain files defined here, e.g. python_headers. Due to this in the project I work with we have our own host toolchains defined and a build setting to opt into using the host toolchain instead of the hermetic one.

rickeylev · 2024-07-17T06:10:02Z

what is missing are the toolchain definitions, headers ... we have our own host toolchains defined

Oh that's perfect. That's exactly the sort of case I had in mind. PR #2000 added the basics to do all that, though the APIs are approximately private. I created #2070 to collect user feedback.

I'd be particularly interested in if you're doing anything to automatically locate e.g. headers and libraries to wire them into the toolchain. Inspecting the Python install to figure that out was quite a mystery, and I'm really uncertain what edge cases I missed.

martis42 · 2024-07-17T18:21:05Z

We are not dynamically discovering headers and libraries. Currently we only support host toolchains for Linux (Ubuntu) and a subset of Python versions. For this use case it was easier to have dedicated BUILD file per toolchain and let users select them explicitly.
From my experience the BUILD files differ only with the version number in the paths to the toolchain files but not structurally. At least for Ubuntu hosts a generic solution should be trivial.

One caveat is that multiple Debian packages are involved in a full Python toolchain. Not everybody might have python3-dev installed. To make sure users don't experience hard to understand errors it might help to search for some key file in a repository rule and provide a verbose error if they are missing.

rickeylev · 2024-07-17T20:17:31Z

(back to OP issue)
I was told that the 7.2 release of Bazel contained some performance improvements to sandboxing, which should help with the overhead of creating files. There's also a flag that might make things faster: --experimental_inmemory_sandbox_stashes

martis42 · 2024-07-28T17:19:41Z

I used my artificial benchmark to look into this issue again.

--experimental_inmemory_sandbox_stashes does not improve performance for me. Maybe the flag only improves performance in specific use cases which do not fit my system. However, when comparing Bazel versions one sees that Bazel indeed became more efficient for setting up and executing sandboxed tests. For my test setup I saw the following numbers:

Bazel version	Elapsed time
6.5.0	54,3 s
7.0.0	48,0 s
7.2.1	36.4 s
8.0.0-pre.20240718.2	36.9 s

This is great, as this is a free improvement for everybody by simply using a recent Bazel version.
Still, compared to using a host interpreter the overhead is significant. I still believe Bazel should support an optimized concept to linking toolchain artifacts into the runfiles tree.

We now use the lock file for local optimization, but do not check it in yet. We will try out this feature a bit before doing so. Bazel 7.2.1 also offers performance improvements, see: bazelbuild/rules_python#1653 (comment)

martis42 mentioned this issue Dec 21, 2023

Flag flip for --experimental_sibling_repository_layout and --legacy_external_runfiles bazelbuild/bazel#12821

Closed

rickeylev mentioned this issue Jan 12, 2024

rules_python is very very slow #1624

Closed

aignas mentioned this issue Jan 22, 2024

Poor unit test performance using py_test rule #311

Open

aignas mentioned this issue Jun 27, 2024

py_binary launch performance problem on windows #2019

Open

github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Jul 10, 2024

github-actions bot removed the Can Close? Will close in 30 days if there is no new activity label Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hermetic toolchains incur a significant overhead to running py_binary and py_test #1653

Hermetic toolchains incur a significant overhead to running py_binary and py_test #1653

martis42 commented Dec 21, 2023

rickeylev commented Dec 22, 2023

rickeylev commented Jan 12, 2024

github-actions bot commented Jul 10, 2024

martis42 commented Jul 11, 2024

rickeylev commented Jul 16, 2024

martis42 commented Jul 17, 2024 •

edited

Loading

rickeylev commented Jul 17, 2024

martis42 commented Jul 17, 2024

rickeylev commented Jul 17, 2024 •

edited

Loading

martis42 commented Jul 28, 2024

Hermetic toolchains incur a significant overhead to running py_binary and py_test #1653

Hermetic toolchains incur a significant overhead to running py_binary and py_test #1653

Comments

martis42 commented Dec 21, 2023

🐞 bug report

Affected Rule

Is this a regression?

Description

🔬 Minimal Reproduction

🔥 Exception or Error

🌍 Your Environment

rickeylev commented Dec 22, 2023

rickeylev commented Jan 12, 2024

github-actions bot commented Jul 10, 2024

martis42 commented Jul 11, 2024

rickeylev commented Jul 16, 2024

martis42 commented Jul 17, 2024 • edited Loading

rickeylev commented Jul 17, 2024

martis42 commented Jul 17, 2024

rickeylev commented Jul 17, 2024 • edited Loading

martis42 commented Jul 28, 2024

martis42 commented Jul 17, 2024 •

edited

Loading

rickeylev commented Jul 17, 2024 •

edited

Loading