Fix test_shell_command_interruption flake #10649
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Human summary
Sandboxing (specifically
LandlockRestrict) is means that e.g.sleep 10fails immediately. Therefore it cannot be interrupted.In suite::interrupt::test_shell_command_interruption, sleep 10 is issued at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}), then fails at 17:28:16.589 with duration_ms=34, success=false, exit_code=101, and
Sandbox(LandlockRestrict).
Codex summary
sandbox_mode = "danger-full-access"ininterruptandv2/turn_interruptintegration testssandbox: Some(SandboxMode::DangerFullAccess)intest_codex_jsonrpc_conversation_flowsandbox_policy: Some(SandboxPolicy::DangerFullAccess)incommand_execution_notifications_include_process_idWhy
On some Linux CI environments, command execution fails immediately with
LandlockRestrictwhen sandboxed. These tests are intended to validate JSON-RPC/task lifecycle behavior (interrupt semantics, command notification shape/process id, request flow), but early sandbox startup failure changes turn flow and can trigger extra follow-up requests, causing flakes.This change removes environment-specific sandbox startup dependency from these tests while preserving their primary intent.
Testing