-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux-sandbox is not available occasionally since Bazel 6.0.0 #18071
Comments
the 1s timeout was introduced for #15373 |
I am slightly confused here:
From the point above, I would imagine your current setup is something like this: 1 VM/baremetal host running Bazel in multiple containers. After each build, you are discarding your Bazel container, killing the Bazel JVM inside. You are overfitting the host system to saturate the CPU consumption which resulted in "noisy neighbor": builds from a few busier containers are consuming too many CPU, stopping new containers from being spun up successfully. If that's indeed the case, my advice would be: to switch to using a set of "persistent containers" and re-use the containers + Bazel JVM inside between runs. Each container should be assigned a fixed set of CPU/RAM, enforced by the container runtime to ensure that they don't use more resources than they should. I don't think the mitigation in #18151 will solve your issue at all. You are only delaying the noisy neighbor issue from Bazel's startup phase down to the action execution phase. The 5% failure rate would still retain as all of your action execution will be delayed in scheduling by the same wait time. |
…ility A 1s timeout was introduced in checking whether LinuxSandbox is available, to prevent a complete hangup on broken systems. However, it turned out that it occasionally results in misjudging that linux-sandbox being not available. `local_termination_grace_seconds` defaults to 15s, which hopefully gives more headroom and configurability in various setups. Fixes bazelbuild#18071 Closes bazelbuild#18151. PiperOrigin-RevId: 536953768 Change-Id: I5d344ee5bf06cb9b13a2cba9d077f0981f4430a3
…ility (#18568) A 1s timeout was introduced in checking whether LinuxSandbox is available, to prevent a complete hangup on broken systems. However, it turned out that it occasionally results in misjudging that linux-sandbox being not available. `local_termination_grace_seconds` defaults to 15s, which hopefully gives more headroom and configurability in various setups. Fixes #18071 Closes #18151. PiperOrigin-RevId: 536953768 Change-Id: I5d344ee5bf06cb9b13a2cba9d077f0981f4430a3 Co-authored-by: Takeo Sawada <myc.monad@gmail.com>
Description of the bug:
We noticed that Bazel occasionally (about 5% in our env) fails due to
linux-sandbox
not being available.It seems that recently a 1s timeout was introduced in checking if
linux-sandbox
available #15414, which might be too tight under load.In our setup, we disable all other weaker sandboxes for hermeticity, which makes this fail reliably and easy to notice. I suspect this is happening on more environments, but people haven't noticed because of
processwrapper-sandbox
fallback.CC @meisterT
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
run Bazel >=6.0.0 with
--spawn_strategy=worker,linux-sandbox
under a heavy load many times.Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release
?No response
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: