macOS: Cannot delete sandbox directory after action execution #2371

bendowski · 2017-01-18T15:53:06Z

Description of the problem / feature request / question:

After running tests, I see this warning:

WARNING: Cannot delete sandbox directory after action execution: /private/var/tmp/_bazel_user/81e2f173210dcd47c26dbf4a42147a8b/bazel-sandbox/2551a3b4-abe3-4c34-a39d-37b44153e291-0 (java.io.IOException: /private/var/tmp/_bazel_user/81e2f173210dcd47c26dbf4a42147a8b/bazel-sandbox/2551a3b4-abe3-4c34-a39d-37b44153e291-0/execroot/master/_tmp/tests_2 (Directory not empty)).

Please let me know how I can provide more debugging information.

Environment info

Operating System:
macOS Sierra 10.12.2
Bazel version (output of bazel info release):
release 0.4.3-homebrew

The text was updated successfully, but these errors were encountered:

hermione521 · 2017-01-19T10:22:03Z

We used to have this problem although I don't remember the reason. I remember @philwo had a fix 95b16a8. I'm not sure if this should happen now...

laszlocsomor · 2017-01-23T12:16:08Z

@benol : Thanks for the bug report! I'm trying to collect some hopefully useful data:

Do you see this warning consistently or just intermittently?
Have you seen it with just this target or with other tests too?
Is the directory left behind after Bazel finished? Are you able to list the contents of it does it indeed look non-empty?
Are you aware of any workaround?

bendowski · 2017-01-24T13:42:37Z

I can see it consistently for one target that creates a lot of files in TEST_TMPDIR and runs other processes from within the test.
Only in this target.
It's left on disk and it's non-empty.

In our case we spawn a new JVM to work, inside the sandbox, on files in the TEST_TMPDIR directory. After Bazel fails to clean up the sandbox, I can see this directory left there, with a lock file used by the child process.

So it's possible that Bazel fails to kill all child processes before deleting sandbox. It would seem the child process was either alive and holding the lock, or killed forcefully and Bazel fails to delete the locked file.

After Bazel prints the warning and leaves the sandbox directory on disk, I can manually delete it with simple "rm -rf /private/var/.../bazel-sandbox/hashhash"

We don't have this problem on Linux.

laszlocsomor · 2017-01-25T09:19:54Z

Thanks, that's great info.

So it sounds like the sandbox cleanup is broken -- it doesn't kill all child processes and fails to clean up all directories.

@hermione521 : does that sound like a plausible root cause? I think we could repro this with an action that spawns a process which holds on to a file descriptor and doesn't terminate.

hermione521 · 2017-01-27T13:09:10Z

I tried several things but still can't reproduce. It would be much helpful if you can provide a minimal example to reproduce. Thank you!

ittaiz · 2017-01-27T13:17:26Z

I actually got a similar message but I'm not sure it was deterministic.
I was wading through several issues so ignored it for the time being but if it will surface again I'll try and triage and generate a minimal example.

hermione521 · 2017-01-27T13:41:52Z

@ittaiz that would be very helpful! Thank you in advance!

ittaiz · 2017-03-12T14:21:39Z

@hermione521 happened again to me today but since I'm generating a big bazel codebase (rather generating hundreds of builds files via a migration tool from maven) I don't think I can generate a minima example. Mainly since I don't know why this happens.
Any chance you can give me a few concrete steps you'd like me to take when this occurs to capture the state of the workshop?

ittaiz · 2017-03-20T15:23:26Z

@hermione521 ping? happened to me again. a small repro isn't likely but maybe I can dig some more details if you point me to the right direction

hermione521 · 2017-03-21T12:59:55Z

Hmm.. I don't have any idea except clean them manually.. Let's ping @philwo to see if it helps.

philwo · 2017-03-21T13:57:24Z

I'm currently doing a big round of bug-fixes and will try to fix whatever might cause this. I'll specifically look into whether there are any race conditions in Bazel's process management on macOS.

I'll follow up on this, but if I don't, please ping this bug in ~1 week and I'll send a status update. :)

ittaiz · 2017-03-29T07:51:06Z

Ping :)

…

On Tue, 21 Mar 2017 at 15:57 Philipp Wollermann ***@***.***> wrote: I'm currently doing a big round of bug-fixes and will try to fix whatever might cause this. I'll specifically look into whether there are any race conditions in Bazel's process management on macOS. I'll follow up on this, but if I don't, please ping this bug in ~1 week and I'll send a status update. :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2371 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUIF1pCVtP8TR0p7mkCuifXoqZezES6ks5rn9dGgaJpZM4LnDF8> .

philwo · 2017-04-06T14:48:52Z

Hi @ittaiz,

I've been working on a fix for this over the last days, hope to get it submitted tomorrow. Will ping this bug then!

Philipp

This uses Linux's PR_SET_CHILD_SUBREAPER and FreeBSD's PROC_REAP_ACQUIRE features to become an init-like process for all (grand)children spawned by process-wrapper, which allows us to a) kill them reliably and then b) wait for them reliably. Before this change, we only killed the main child, waited for it, then fired off a kill -9 on the process group, without waiting for it. This led to a race condition where Bazel would try to use or delete files that were still helt open by children of the main child and thus to bugs like #2371. This means we now have reliable process management on Linux, FreeBSD and Windows. Unfortunately I couldn't find any feature like this on macOS, so this is the only OS that will still have this race condition. PiperOrigin-RevId: 153817210

ulfjack · 2017-06-29T19:51:16Z

@philwo can we close this now?

philwo · 2017-07-10T15:04:21Z

The fix has been rolled back, so maybe we should keep it open. On the other hand... did this happen to someone in the last month? If not, we can also close it, I don't mind.

ittaiz · 2017-07-10T15:37:25Z

We used to get this from time to time. Don't remember if recently. But if the fix has been rolled back then why not keep it open?

…

On Mon, Jul 10, 2017 at 6:04 PM Philipp Wollermann ***@***.***> wrote: The fix has been rolled back, so maybe we should keep it open. On the other hand... did this happen to someone in the last month? If not, we can also close it, I don't mind. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2371 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABUIFyhGQu3Jg34aLFYE2KCFwWhXNA3Xks5sMj18gaJpZM4LnDF8> .

philwo · 2018-07-19T12:39:21Z

Closing. Please re-open if someone sees this again.

hermione521 added under investigation category: sandboxing labels Jan 19, 2017

laszlocsomor assigned hermione521 Jan 23, 2017

philwo self-assigned this Feb 22, 2017

hermione521 removed their assignment Mar 27, 2017

philwo closed this as completed Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macOS: Cannot delete sandbox directory after action execution #2371

macOS: Cannot delete sandbox directory after action execution #2371

bendowski commented Jan 18, 2017 •

edited

Loading

hermione521 commented Jan 19, 2017

laszlocsomor commented Jan 23, 2017

bendowski commented Jan 24, 2017 •

edited

Loading

laszlocsomor commented Jan 25, 2017

hermione521 commented Jan 27, 2017

ittaiz commented Jan 27, 2017

hermione521 commented Jan 27, 2017

ittaiz commented Mar 12, 2017

ittaiz commented Mar 20, 2017

hermione521 commented Mar 21, 2017

philwo commented Mar 21, 2017

ittaiz commented Mar 29, 2017 via email

philwo commented Apr 6, 2017

ulfjack commented Jun 29, 2017

philwo commented Jul 10, 2017

ittaiz commented Jul 10, 2017 via email

philwo commented Jul 19, 2018

macOS: Cannot delete sandbox directory after action execution #2371

macOS: Cannot delete sandbox directory after action execution #2371

Comments

bendowski commented Jan 18, 2017 • edited Loading

Description of the problem / feature request / question:

Environment info

hermione521 commented Jan 19, 2017

laszlocsomor commented Jan 23, 2017

bendowski commented Jan 24, 2017 • edited Loading

laszlocsomor commented Jan 25, 2017

hermione521 commented Jan 27, 2017

ittaiz commented Jan 27, 2017

hermione521 commented Jan 27, 2017

ittaiz commented Mar 12, 2017

ittaiz commented Mar 20, 2017

hermione521 commented Mar 21, 2017

philwo commented Mar 21, 2017

ittaiz commented Mar 29, 2017 via email

philwo commented Apr 6, 2017

ulfjack commented Jun 29, 2017

philwo commented Jul 10, 2017

ittaiz commented Jul 10, 2017 via email

philwo commented Jul 19, 2018

bendowski commented Jan 18, 2017 •

edited

Loading

bendowski commented Jan 24, 2017 •

edited

Loading