Bug: Bash command polling stops after 2 attempts, causing agent loop to hang

## Summary

Bash command result polling stops after 2 attempts, causing agent loop to hang indefinitely. This leads to 20-minute idle timeouts and 404 errors in SWE-bench evaluations.

## Environment

- **Model**: `litellm_proxy/gpt-5-mini-2025-08-07`
- **SDK**: Remote workspace (eval-runtime cluster)
- **Job**: `eval-eval-20848009420-gpt-5-mini`

## Root Cause

After executing `POST /api/bash/start_bash_command`, the SDK polls `GET /api/bash/bash_events/search` **only 2 times** (~100ms total). If the command has not completed, polling stops and the SDK switches to conversation polling only. The bash result is never retrieved, causing the agent loop to hang.

## Evidence: 4 Failed Runtimes Show Identical Pattern

### Runtime 1: `byqchmjqpdgxhdkl` (django__django-11095)

```
10:29:23.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:29:23.xxx | GET /api/bash/bash_events/search?command_id__eq=9039afd2... 200  ← 1st check
10:29:23.xxx | GET /api/bash/bash_events/search?command_id__eq=9039afd2... 200  ← 2nd check
10:29:23.xxx | GET /api/conversations/1860b2a8-... 200  ← STOPS checking bash_events
10:29:24.xxx | GET /api/conversations/1860b2a8-... 200
10:29:25.xxx | GET /api/conversations/1860b2a8-... 200
... (continues for 20 minutes) ...
10:50:08 | [KILLED - idle for 1244 seconds]
```

### Runtime 2: `hkbjctmbjbbgrycx` (django__django-13670)

```
10:27:37.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:27:37.xxx | GET /api/bash/bash_events/search?command_id__eq=17d36de8... 200  ← 1st check
10:27:37.xxx | GET /api/bash/bash_events/search?command_id__eq=17d36de8... 200  ← 2nd check
10:27:38.xxx | GET /api/conversations/3a6cefbd-... 200  ← STOPS checking bash_events
... (continues for 22 minutes) ...
10:50:08 | [KILLED - idle for 1351 seconds]
```

### Runtime 3: `shgiheepkuhjmnjp` (pydata__xarray-6992)

```
10:34:03.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:34:03.xxx | GET /api/bash/bash_events/search?command_id__eq=9c7a0443... 200  ← 1st check
10:34:03.xxx | GET /api/bash/bash_events/search?command_id__eq=9c7a0443... 200  ← 2nd check
10:34:03.xxx | GET /api/conversations/18ab24cc-... 200  ← STOPS checking bash_events
... (continues for 21 minutes) ...
10:55:08 | [KILLED - idle for 1264 seconds]
```

### Runtime 4: `jtvryxnvddglunzx` (django__django-15957)

```
10:31:38.xxx | POST /api/bash/start_bash_command HTTP/1.1" 200
10:31:38.xxx | GET /api/bash/bash_events/search?command_id__eq=df0a5810... 200  ← 1st check
10:31:39.xxx | GET /api/bash/bash_events/search?command_id__eq=df0a5810... 200  ← 2nd check
10:31:39.xxx | GET /api/conversations/3527bac8-... 200  ← STOPS checking bash_events
... (continues for 23 minutes) ...
10:55:08 | [KILLED - idle for 1408 seconds]
```

## All 8 Affected Instances

| Instance | Runtime ID | Last POST | Killed At | Idle Time |
|----------|------------|-----------|-----------|-----------|
| django__django-11095 | byqchmjqpdgxhdkl | 10:29:23 | 10:50:08 | 1244s |
| django__django-13670 | hkbjctmbjbbgrycx | 10:27:37 | 10:50:08 | 1351s |
| sympy__sympy-23534 | mrjilxopivitvlvb | N/A | 10:50:09 | 1494s |
| pydata__xarray-6992 | shgiheepkuhjmnjp | 10:34:03 | 10:55:08 | 1264s |
| django__django-15957 | jtvryxnvddglunzx | 10:31:38 | 10:55:08 | 1408s |
| django__django-10097 | qqttditbjzxtwbcq | N/A | 11:00:08 | 1277s |
| django__django-15499 | aafaqbawqxlhuocb | N/A | 11:05:08 | 1317s |
| matplotlib__matplotlib-22871 | mmgirxwhisdulrrn | N/A | 11:05:08 | 1493s |

## Failure Sequence

```
1. LLM generates ToolCallAction (bash command)           ✓ Works
2. SDK sends POST /api/bash/start_bash_command           ✓ Works (200 OK)
3. SDK polls GET /api/bash/bash_events/search            ✓ Works (1st check)
4. SDK polls GET /api/bash/bash_events/search            ✓ Works (2nd check)
5. SDK stops polling bash_events                         ✗ BUG - should continue
6. SDK only polls GET /api/conversations/...             ✗ Wrong - waiting for nothing
7. No ObservationEvent recorded                          ✗ Agent loop stuck
8. 20 minutes pass with no tool executions
9. Runtime killed for idle (1200s threshold)
10. Evaluator gets 404 → retry → resource pressure
```

## Expected Behavior

The SDK should continue polling `bash_events/search` until:
- The command completes (exit event received), OR
- A configurable timeout is reached (then emit ErrorObservation)

## Suggested Fix

In the bash command execution code, replace the current polling logic:

```python
# Current (broken): Only 2 attempts
for _ in range(2):
    result = poll_bash_events(command_id)
    if result.completed:
        return result

# Fixed: Poll until completion or timeout
start = time.time()
while time.time() - start < BASH_TIMEOUT:
    result = poll_bash_events(command_id)
    if result.completed:
        return result
    await asyncio.sleep(0.1)
raise TimeoutError(f"Bash command {command_id} did not complete within {BASH_TIMEOUT}s")
```

## References

- OpenHands/benchmarks#239 - Original SWE-bench failure report
- OpenHands/benchmarks#285 - Event persistence PR (enabled this analysis)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Bash command polling stops after 2 attempts, causing agent loop to hang #1633

Summary

Environment

Root Cause

Evidence: 4 Failed Runtimes Show Identical Pattern

Runtime 1: `byqchmjqpdgxhdkl` (django__django-11095)

Runtime 2: `hkbjctmbjbbgrycx` (django__django-13670)

Runtime 3: `shgiheepkuhjmnjp` (pydata__xarray-6992)

Runtime 4: `jtvryxnvddglunzx` (django__django-15957)

All 8 Affected Instances

Failure Sequence

Expected Behavior

Suggested Fix

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Instance	Runtime ID	Last POST	Killed At	Idle Time
django__django-11095	byqchmjqpdgxhdkl	10:29:23	10:50:08	1244s
django__django-13670	hkbjctmbjbbgrycx	10:27:37	10:50:08	1351s
sympy__sympy-23534	mrjilxopivitvlvb	N/A	10:50:09	1494s
pydata__xarray-6992	shgiheepkuhjmnjp	10:34:03	10:55:08	1264s
django__django-15957	jtvryxnvddglunzx	10:31:38	10:55:08	1408s
django__django-10097	qqttditbjzxtwbcq	N/A	11:00:08	1277s
django__django-15499	aafaqbawqxlhuocb	N/A	11:05:08	1317s
matplotlib__matplotlib-22871	mmgirxwhisdulrrn	N/A	11:05:08	1493s

Bug: Bash command polling stops after 2 attempts, causing agent loop to hang #1633

Description

Summary

Environment

Root Cause

Evidence: 4 Failed Runtimes Show Identical Pattern

Runtime 1: byqchmjqpdgxhdkl (django__django-11095)

Runtime 2: hkbjctmbjbbgrycx (django__django-13670)

Runtime 3: shgiheepkuhjmnjp (pydata__xarray-6992)

Runtime 4: jtvryxnvddglunzx (django__django-15957)

All 8 Affected Instances

Failure Sequence

Expected Behavior

Suggested Fix

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Runtime 1: `byqchmjqpdgxhdkl` (django__django-11095)

Runtime 2: `hkbjctmbjbbgrycx` (django__django-13670)

Runtime 3: `shgiheepkuhjmnjp` (pydata__xarray-6992)

Runtime 4: `jtvryxnvddglunzx` (django__django-15957)