-
Notifications
You must be signed in to change notification settings - Fork 156
Closed
Labels
openhandsSolving the issue with OpenHands.Solving the issue with OpenHands.
Description
Summary
Bash command result polling stops after 2 attempts, causing agent loop to hang indefinitely. This leads to 20-minute idle timeouts and 404 errors in SWE-bench evaluations.
Environment
- Model:
litellm_proxy/gpt-5-mini-2025-08-07 - SDK: Remote workspace (eval-runtime cluster)
- Job:
eval-eval-20848009420-gpt-5-mini
Root Cause
After executing POST /api/bash/start_bash_command, the SDK polls GET /api/bash/bash_events/search only 2 times (~100ms total). If the command has not completed, polling stops and the SDK switches to conversation polling only. The bash result is never retrieved, causing the agent loop to hang.
Evidence: 4 Failed Runtimes Show Identical Pattern
Runtime 1: byqchmjqpdgxhdkl (django__django-11095)
10:29:23.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:29:23.xxx | GET /api/bash/bash_events/search?command_id__eq=9039afd2... 200 ← 1st check
10:29:23.xxx | GET /api/bash/bash_events/search?command_id__eq=9039afd2... 200 ← 2nd check
10:29:23.xxx | GET /api/conversations/1860b2a8-... 200 ← STOPS checking bash_events
10:29:24.xxx | GET /api/conversations/1860b2a8-... 200
10:29:25.xxx | GET /api/conversations/1860b2a8-... 200
... (continues for 20 minutes) ...
10:50:08 | [KILLED - idle for 1244 seconds]
Runtime 2: hkbjctmbjbbgrycx (django__django-13670)
10:27:37.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:27:37.xxx | GET /api/bash/bash_events/search?command_id__eq=17d36de8... 200 ← 1st check
10:27:37.xxx | GET /api/bash/bash_events/search?command_id__eq=17d36de8... 200 ← 2nd check
10:27:38.xxx | GET /api/conversations/3a6cefbd-... 200 ← STOPS checking bash_events
... (continues for 22 minutes) ...
10:50:08 | [KILLED - idle for 1351 seconds]
Runtime 3: shgiheepkuhjmnjp (pydata__xarray-6992)
10:34:03.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:34:03.xxx | GET /api/bash/bash_events/search?command_id__eq=9c7a0443... 200 ← 1st check
10:34:03.xxx | GET /api/bash/bash_events/search?command_id__eq=9c7a0443... 200 ← 2nd check
10:34:03.xxx | GET /api/conversations/18ab24cc-... 200 ← STOPS checking bash_events
... (continues for 21 minutes) ...
10:55:08 | [KILLED - idle for 1264 seconds]
Runtime 4: jtvryxnvddglunzx (django__django-15957)
10:31:38.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:31:38.xxx | GET /api/bash/bash_events/search?command_id__eq=df0a5810... 200 ← 1st check
10:31:39.xxx | GET /api/bash/bash_events/search?command_id__eq=df0a5810... 200 ← 2nd check
10:31:39.xxx | GET /api/conversations/3527bac8-... 200 ← STOPS checking bash_events
... (continues for 23 minutes) ...
10:55:08 | [KILLED - idle for 1408 seconds]
All 8 Affected Instances
| Instance | Runtime ID | Last POST | Killed At | Idle Time |
|---|---|---|---|---|
| django__django-11095 | byqchmjqpdgxhdkl | 10:29:23 | 10:50:08 | 1244s |
| django__django-13670 | hkbjctmbjbbgrycx | 10:27:37 | 10:50:08 | 1351s |
| sympy__sympy-23534 | mrjilxopivitvlvb | N/A | 10:50:09 | 1494s |
| pydata__xarray-6992 | shgiheepkuhjmnjp | 10:34:03 | 10:55:08 | 1264s |
| django__django-15957 | jtvryxnvddglunzx | 10:31:38 | 10:55:08 | 1408s |
| django__django-10097 | qqttditbjzxtwbcq | N/A | 11:00:08 | 1277s |
| django__django-15499 | aafaqbawqxlhuocb | N/A | 11:05:08 | 1317s |
| matplotlib__matplotlib-22871 | mmgirxwhisdulrrn | N/A | 11:05:08 | 1493s |
Failure Sequence
1. LLM generates ToolCallAction (bash command) ✓ Works
2. SDK sends POST /api/bash/start_bash_command ✓ Works (200 OK)
3. SDK polls GET /api/bash/bash_events/search ✓ Works (1st check)
4. SDK polls GET /api/bash/bash_events/search ✓ Works (2nd check)
5. SDK stops polling bash_events ✗ BUG - should continue
6. SDK only polls GET /api/conversations/... ✗ Wrong - waiting for nothing
7. No ObservationEvent recorded ✗ Agent loop stuck
8. 20 minutes pass with no tool executions
9. Runtime killed for idle (1200s threshold)
10. Evaluator gets 404 → retry → resource pressure
Expected Behavior
The SDK should continue polling bash_events/search until:
- The command completes (exit event received), OR
- A configurable timeout is reached (then emit ErrorObservation)
Suggested Fix
In the bash command execution code, replace the current polling logic:
# Current (broken): Only 2 attempts
for _ in range(2):
result = poll_bash_events(command_id)
if result.completed:
return result
# Fixed: Poll until completion or timeout
start = time.time()
while time.time() - start < BASH_TIMEOUT:
result = poll_bash_events(command_id)
if result.completed:
return result
await asyncio.sleep(0.1)
raise TimeoutError(f"Bash command {command_id} did not complete within {BASH_TIMEOUT}s")References
- Track 500-image SWE-bench eval (eval-20699625809-gpt-5-mini) benchmarks#239 - Original SWE-bench failure report
- Persist conversation events to datadog benchmarks#285 - Event persistence PR (enabled this analysis)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
openhandsSolving the issue with OpenHands.Solving the issue with OpenHands.