Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that sandboxed processes exit before their sandboxes are cleaned up #18632

Merged
merged 6 commits into from
Mar 31, 2023

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Mar 31, 2023

As @jsirois discovered and described in #16778 (comment), because the local::CommandRunner does not wait for its child process to have exited, it might be possible for a SIGKILL to not have taken effect on the child process before its sandbox has been torn down. And that could result in the process seeing a partial sandbox (and in the case of #16778, potentially cause named cache corruption).

This change ensures that we block to call wait when spawning local processes by wrapping them in the ManagedChild helper that we were already using for interactive processes. It then attempts to clarify the contract of the CapturedWorkdir trait (but in order to improve cherry-pickability does not attempt to adjust it), to make it clear that child processes should fully exit before the run_and_capture_workdir method has returned.

Fixes #16778 🤞.

@stuhood stuhood added needs-cherrypick category:bugfix Bug fixes for released features labels Mar 31, 2023
@stuhood stuhood added this to the 2.14.x milestone Mar 31, 2023
@stuhood
Copy link
Member Author

stuhood commented Mar 31, 2023

Commits are useful to review independently.

It then attempts to clarify the contract of the CapturedWorkdir trait (but in order to improve cherry-pickability does not attempt to adjust it), to make it clear that child processes should fully exit before the run_and_capture_workdir method has returned.

I'll likely open a ticket about this when I have more time, because it's a fairly awkward interface. And one implementation re-uses a single sandbox, two use dedicated sandboxes, etc.

@stuhood stuhood enabled auto-merge (squash) March 31, 2023 18:20
@jsirois jsirois disabled auto-merge March 31, 2023 18:52
@jsirois jsirois merged commit 1462bef into pantsbuild:main Mar 31, 2023
jsirois pushed a commit to jsirois/pants that referenced this pull request Mar 31, 2023
…ed up (Cherry-pick of pantsbuild#18632)

As @jsirois discovered and described in
pantsbuild#16778 (comment),
because the `local::CommandRunner` does not `wait` for its child process
to have exited, it might be possible for a SIGKILL to not have taken
effect on the child process before its sandbox has been torn down. And
that could result in the process seeing a partial sandbox (and in the
case of pantsbuild#16778, potentially cause named cache corruption).

This change ensures that we block to call `wait` when spawning local
processes by wrapping them in the `ManagedChild` helper that we were
already using for interactive processes. It then attempts to clarify the
contract of the `CapturedWorkdir` trait (but in order to improve
cherry-pickability does not attempt to adjust it), to make it clear that
child processes should fully exit before the `run_and_capture_workdir`
method has returned.

Fixes pantsbuild#16778 🤞.

---------

Co-authored-by: John Sirois <john.sirois@gmail.com>
(cherry picked from commit 1462bef)
jsirois pushed a commit to jsirois/pants that referenced this pull request Mar 31, 2023
…ed up (Cherry-pick of pantsbuild#18632)

As @jsirois discovered and described in
pantsbuild#16778 (comment),
because the `local::CommandRunner` does not `wait` for its child process
to have exited, it might be possible for a SIGKILL to not have taken
effect on the child process before its sandbox has been torn down. And
that could result in the process seeing a partial sandbox (and in the
case of pantsbuild#16778, potentially cause named cache corruption).

This change ensures that we block to call `wait` when spawning local
processes by wrapping them in the `ManagedChild` helper that we were
already using for interactive processes. It then attempts to clarify the
contract of the `CapturedWorkdir` trait (but in order to improve
cherry-pickability does not attempt to adjust it), to make it clear that
child processes should fully exit before the `run_and_capture_workdir`
method has returned.

Fixes pantsbuild#16778 🤞.

---------

Co-authored-by: John Sirois <john.sirois@gmail.com>
(cherry picked from commit 1462bef)
jsirois pushed a commit to jsirois/pants that referenced this pull request Mar 31, 2023
…ed up (Cherry-pick of pantsbuild#18632)

As @jsirois discovered and described in
pantsbuild#16778 (comment),
because the `local::CommandRunner` does not `wait` for its child process
to have exited, it might be possible for a SIGKILL to not have taken
effect on the child process before its sandbox has been torn down. And
that could result in the process seeing a partial sandbox (and in the
case of pantsbuild#16778, potentially cause named cache corruption).

This change ensures that we block to call `wait` when spawning local
processes by wrapping them in the `ManagedChild` helper that we were
already using for interactive processes. It then attempts to clarify the
contract of the `CapturedWorkdir` trait (but in order to improve
cherry-pickability does not attempt to adjust it), to make it clear that
child processes should fully exit before the `run_and_capture_workdir`
method has returned.

Fixes pantsbuild#16778 🤞.

---------

Co-authored-by: John Sirois <john.sirois@gmail.com>
(cherry picked from commit 1462bef)
@stuhood stuhood deleted the stuhood/sandboxed-process-drop branch March 31, 2023 22:26
jsirois added a commit that referenced this pull request Mar 31, 2023
#18641)

…ed up (Cherry-pick of #18632)

As @jsirois discovered and described in

#16778 (comment),
because the `local::CommandRunner` does not `wait` for its child process
to have exited, it might be possible for a SIGKILL to not have taken
effect on the child process before its sandbox has been torn down. And
that could result in the process seeing a partial sandbox (and in the
case of #16778, potentially cause named cache corruption).

This change ensures that we block to call `wait` when spawning local
processes by wrapping them in the `ManagedChild` helper that we were
already using for interactive processes. It then attempts to clarify the
contract of the `CapturedWorkdir` trait (but in order to improve
cherry-pickability does not attempt to adjust it), to make it clear that
child processes should fully exit before the `run_and_capture_workdir`
method has returned.

Fixes #16778 🤞.

---------

Co-authored-by: John Sirois <john.sirois@gmail.com>
(cherry picked from commit 1462bef)

Co-authored-by: Stu Hood <stuhood@gmail.com>
jsirois added a commit that referenced this pull request Mar 31, 2023
#18640)

…ed up (Cherry-pick of #18632)

As @jsirois discovered and described in

#16778 (comment),
because the `local::CommandRunner` does not `wait` for its child process
to have exited, it might be possible for a SIGKILL to not have taken
effect on the child process before its sandbox has been torn down. And
that could result in the process seeing a partial sandbox (and in the
case of #16778, potentially cause named cache corruption).

This change ensures that we block to call `wait` when spawning local
processes by wrapping them in the `ManagedChild` helper that we were
already using for interactive processes. It then attempts to clarify the
contract of the `CapturedWorkdir` trait (but in order to improve
cherry-pickability does not attempt to adjust it), to make it clear that
child processes should fully exit before the `run_and_capture_workdir`
method has returned.

Fixes #16778 🤞.

---------

Co-authored-by: John Sirois <john.sirois@gmail.com>
(cherry picked from commit 1462bef)

Co-authored-by: Stu Hood <stuhood@gmail.com>
jsirois added a commit that referenced this pull request Mar 31, 2023
#18642)

…ed up (Cherry-pick of #18632)

As @jsirois discovered and described in

#16778 (comment),
because the `local::CommandRunner` does not `wait` for its child process
to have exited, it might be possible for a SIGKILL to not have taken
effect on the child process before its sandbox has been torn down. And
that could result in the process seeing a partial sandbox (and in the
case of #16778, potentially cause named cache corruption).

This change ensures that we block to call `wait` when spawning local
processes by wrapping them in the `ManagedChild` helper that we were
already using for interactive processes. It then attempts to clarify the
contract of the `CapturedWorkdir` trait (but in order to improve
cherry-pickability does not attempt to adjust it), to make it clear that
child processes should fully exit before the `run_and_capture_workdir`
method has returned.

Fixes #16778 🤞.

---------

Co-authored-by: John Sirois <john.sirois@gmail.com>
(cherry picked from commit 1462bef)

Co-authored-by: Stu Hood <stuhood@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:bugfix Bug fixes for released features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nondeterministic ImportError: No module named __pants_df_parser while computing changed targets
5 participants