Timeouts #325

xingdi-eric-yuan · 2025-12-19T21:40:36Z

This PR adds timeout handling to prevent agents from hanging:

Azure OpenAI timeout - Set 5-minute timeout on AOAI client to prevent CLOSE_WAIT connection hangs (now raises and will be retried)
Command timeout - Add configurable command_timeout (default: 300s) to all terminal implementations (Docker, Kubernetes, Local) to kill blocking commands like serve_forever() or infinite loops

E.g.,

terminal:
  type: docker
  command_timeout: 300  # 5 min per command

Also:
Fix Docker client and shell process resource leaks

DockerTerminal.close(): Close docker_client to release connection pool
resources. Previously only the container was stopped but the client
connection was never closed, causing pool exhaustion after ~10 iterations.
ShellSession.close(): Add process.wait() after terminate() to avoid
zombie processes. If wait times out, forcefully kill the process.

Use shlex.quote() instead of Python repr for proper shell escaping when wrapping commands with timeout. This fixes syntax errors when commands contain special characters like parentheses (e.g., in git apply patches). Also update test_eval_timeout to expect the new timeout message format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- DockerTerminal.close(): Close docker_client to release connection pool resources. Previously only the container was stopped but the client connection was never closed, causing pool exhaustion after ~10 iterations. - ShellSession.close(): Add process.wait() after terminate() to avoid zombie processes. If wait times out, forcefully kill the process. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The previous fix closed the Docker client in close() to prevent connection leaks, but setting docker_client = None broke terminal reuse since setup_container() would fail with 'NoneType' has no attribute 'containers'. This fix converts docker_client to a lazy-initialized property that recreates the client on demand if it was previously closed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The close() method was only stopping the listener thread but not closing FileHandler instances, leaving file handles open. This caused processes to hang indefinitely when many loggers were created (e.g., per-agent loggers in parallel evaluation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When nohup commands with background execution (&) are run through the timeout wrapper in Docker/Kubernetes terminals, the timeout command doesn't return immediately because background processes inherit the shell's stdout/stderr file descriptors. This fix adds proper output redirection (> /dev/null 2>&1) to the gunicorn nohup commands in SWE-Bench setup, ensuring the timeout wrapper returns immediately after the shell exits instead of waiting for the full timeout period. Also adds comprehensive tests for both Docker and Kubernetes terminals to verify nohup commands with proper redirection return immediately. Fixes issue reported in #325 where requests package setup would hang for 300 seconds when starting background gunicorn servers. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Fix nohup commands not returning immediately with timeout wrapper When nohup commands with background execution (&) are run through the timeout wrapper in Docker/Kubernetes terminals, the timeout command doesn't return immediately because background processes inherit the shell's stdout/stderr file descriptors. This fix adds proper output redirection (> /dev/null 2>&1) to the gunicorn nohup commands in SWE-Bench setup, ensuring the timeout wrapper returns immediately after the shell exits instead of waiting for the full timeout period. Also adds comprehensive tests for both Docker and Kubernetes terminals to verify nohup commands with proper redirection return immediately. Fixes issue reported in #325 where requests package setup would hang for 300 seconds when starting background gunicorn servers. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Use setsid to properly detach background processes in non-TTY mode The previous fix using only output redirection was insufficient because in non-TTY mode, the timeout command monitors the entire process group, not just file descriptors. Even with > /dev/null 2>&1, backgrounded processes remain in the same process group as the shell. Using setsid creates a new session, completely detaching the process from timeout's process group. This ensures the timeout-wrapped command returns immediately after the shell exits, even in non-TTY execution contexts like docker exec and kubectl exec. Changes: - Added setsid before nohup gunicorn commands - Updated test names and documentation to reflect setsid usage - Tests verify processes detach properly in non-TTY mode Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * minor * nested shell * subshell * Run eval (if available) before applying the gold patch in the solution agent. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Marc-Alexandre Côté <marc.cote.19@gmail.com>

xingdi-eric-yuan and others added 4 commits December 19, 2025 12:45

Update azure_openai.py

4887c6a

command timeout (for bash/eval)

44a1ae8

Merge branch 'main' into aoai_timeout

3534f2d

xingdi-eric-yuan requested a review from matheper December 19, 2025 22:00

xingdi-eric-yuan and others added 3 commits December 20, 2025 14:47

update json log viewer to allow cross-origin requests

83fbeb9

xingdi-eric-yuan mentioned this pull request Jan 2, 2026

Security and Code Quality Improvements #327

Merged

MarcCote approved these changes Jan 7, 2026

View reviewed changes

MarcCote merged commit 9fcfb86 into main Jan 7, 2026
11 checks passed

MarcCote deleted the aoai_timeout branch January 7, 2026 14:53

xingdi-eric-yuan mentioned this pull request Jan 8, 2026

Fix nohup commands not returning immediately with timeout wrapper #329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeouts #325

Timeouts #325

Uh oh!

xingdi-eric-yuan commented Dec 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Timeouts #325

Timeouts #325

Uh oh!

Conversation

xingdi-eric-yuan commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xingdi-eric-yuan commented Dec 19, 2025 •

edited

Loading