Skip to content

Conversation

@gt-oai
Copy link
Contributor

@gt-oai gt-oai commented Feb 4, 2026

Human summary

Sandboxing (specifically LandlockRestrict) is means that e.g. sleep 10 fails immediately. Therefore it cannot be interrupted.

In suite::interrupt::test_shell_command_interruption, sleep 10 is issued at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}), then fails at 17:28:16.589 with duration_ms=34, success=false, exit_code=101, and
Sandbox(LandlockRestrict).

Codex summary

  • set sandbox_mode = "danger-full-access" in interrupt and v2/turn_interrupt integration tests
  • set sandbox: Some(SandboxMode::DangerFullAccess) in test_codex_jsonrpc_conversation_flow
  • set sandbox_policy: Some(SandboxPolicy::DangerFullAccess) in command_execution_notifications_include_process_id

Why

On some Linux CI environments, command execution fails immediately with LandlockRestrict when sandboxed. These tests are intended to validate JSON-RPC/task lifecycle behavior (interrupt semantics, command notification shape/process id, request flow), but early sandbox startup failure changes turn flow and can trigger extra follow-up requests, causing flakes.

This change removes environment-specific sandbox startup dependency from these tests while preserving their primary intent.

Testing

  • not run in this environment (per request)

@gt-oai gt-oai changed the title Stabilize app-server command/interrupt tests under Linux sandbox flake Fix test_shell_command_interruption flake Feb 4, 2026
Copy link
Collaborator

@sayan-oai sayan-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty

@gt-oai gt-oai merged commit 7c6d21a into main Feb 4, 2026
32 checks passed
@gt-oai gt-oai deleted the gt/flake1 branch February 4, 2026 22:19
@github-actions github-actions bot locked and limited conversation to collaborators Feb 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants