-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS: Cannot delete sandbox directory after action execution #2371
Comments
@benol : Thanks for the bug report! I'm trying to collect some hopefully useful data:
|
In our case we spawn a new JVM to work, inside the sandbox, on files in the TEST_TMPDIR directory. After Bazel fails to clean up the sandbox, I can see this directory left there, with a lock file used by the child process. So it's possible that Bazel fails to kill all child processes before deleting sandbox. It would seem the child process was either alive and holding the lock, or killed forcefully and Bazel fails to delete the locked file. After Bazel prints the warning and leaves the sandbox directory on disk, I can manually delete it with simple "rm -rf /private/var/.../bazel-sandbox/hashhash" We don't have this problem on Linux. |
Thanks, that's great info. So it sounds like the sandbox cleanup is broken -- it doesn't kill all child processes and fails to clean up all directories. @hermione521 : does that sound like a plausible root cause? I think we could repro this with an action that spawns a process which holds on to a file descriptor and doesn't terminate. |
I tried several things but still can't reproduce. It would be much helpful if you can provide a minimal example to reproduce. Thank you! |
I actually got a similar message but I'm not sure it was deterministic. |
@ittaiz that would be very helpful! Thank you in advance! |
@hermione521 happened again to me today but since I'm generating a big bazel codebase (rather generating hundreds of builds files via a migration tool from maven) I don't think I can generate a minima example. Mainly since I don't know why this happens. |
@hermione521 ping? happened to me again. a small repro isn't likely but maybe I can dig some more details if you point me to the right direction |
Hmm.. I don't have any idea except clean them manually.. Let's ping @philwo to see if it helps. |
I'm currently doing a big round of bug-fixes and will try to fix whatever might cause this. I'll specifically look into whether there are any race conditions in Bazel's process management on macOS. I'll follow up on this, but if I don't, please ping this bug in ~1 week and I'll send a status update. :) |
Ping :)
…On Tue, 21 Mar 2017 at 15:57 Philipp Wollermann ***@***.***> wrote:
I'm currently doing a big round of bug-fixes and will try to fix whatever
might cause this. I'll specifically look into whether there are any race
conditions in Bazel's process management on macOS.
I'll follow up on this, but if I don't, please ping this bug in ~1 week
and I'll send a status update. :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2371 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIF1pCVtP8TR0p7mkCuifXoqZezES6ks5rn9dGgaJpZM4LnDF8>
.
|
Hi @ittaiz, I've been working on a fix for this over the last days, hope to get it submitted tomorrow. Will ping this bug then! Philipp |
This uses Linux's PR_SET_CHILD_SUBREAPER and FreeBSD's PROC_REAP_ACQUIRE features to become an init-like process for all (grand)children spawned by process-wrapper, which allows us to a) kill them reliably and then b) wait for them reliably. Before this change, we only killed the main child, waited for it, then fired off a kill -9 on the process group, without waiting for it. This led to a race condition where Bazel would try to use or delete files that were still helt open by children of the main child and thus to bugs like #2371. This means we now have reliable process management on Linux, FreeBSD and Windows. Unfortunately I couldn't find any feature like this on macOS, so this is the only OS that will still have this race condition. PiperOrigin-RevId: 153817210
@philwo can we close this now? |
The fix has been rolled back, so maybe we should keep it open. On the other hand... did this happen to someone in the last month? If not, we can also close it, I don't mind. |
We used to get this from time to time. Don't remember if recently. But if
the fix has been rolled back then why not keep it open?
…On Mon, Jul 10, 2017 at 6:04 PM Philipp Wollermann ***@***.***> wrote:
The fix has been rolled back, so maybe we should keep it open. On the
other hand... did this happen to someone in the last month? If not, we can
also close it, I don't mind.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2371 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIFyhGQu3Jg34aLFYE2KCFwWhXNA3Xks5sMj18gaJpZM4LnDF8>
.
|
Closing. Please re-open if someone sees this again. |
Description of the problem / feature request / question:
After running tests, I see this warning:
WARNING: Cannot delete sandbox directory after action execution: /private/var/tmp/_bazel_user/81e2f173210dcd47c26dbf4a42147a8b/bazel-sandbox/2551a3b4-abe3-4c34-a39d-37b44153e291-0 (java.io.IOException: /private/var/tmp/_bazel_user/81e2f173210dcd47c26dbf4a42147a8b/bazel-sandbox/2551a3b4-abe3-4c34-a39d-37b44153e291-0/execroot/master/_tmp/tests_2 (Directory not empty)).
Please let me know how I can provide more debugging information.
Environment info
Operating System:
macOS Sierra 10.12.2
Bazel version (output of
bazel info release
):release 0.4.3-homebrew
The text was updated successfully, but these errors were encountered: