Stream retry does not honor “Try again in x seconds” delays; falls back to exponential backoff

### What version of Codex is running?

codex-cli 0.39.0

### Which model were you using?

gpt-5-2025-08-07 

### What platform is your computer?

Microsoft Windows NT 10.0.22631.0 x64

### What steps can reproduce the bug?

Background:

- The streaming pipeline only respected OpenAI-style phrasing “Please try again in Ns/ms” (when code == rate_limit_exceeded).
- When providers used other natural-language phrasings (“Try again in 35 seconds”, “1.5 minutes”, “2 hours”), the delay was not parsed. The retry path fell back to generic exponential backoff, which can retry too soon or too late.

Repro Steps:

1. Simulate an SSE response.failed event with a message containing a non-OpenAI-style delay, e.g., “Try again in 35 seconds.” A typical minimal simulation is to feed this SSE into the client-side event processor.

2. Observe that the resulting CodexErr::Stream carries delay=None and the caller (run_turn) uses exponential backoff rather than the provided 35 seconds.

3. This reproduces with any of the following message patterns:

   - “Try again in 35 seconds”
   - “try again in 1.5 minutes”
   - “Try again in 2 hours”
   - “Try again in 28ms”

Code snippet (unit test you can add to codex-rs/core/src/client.rs tests) that demonstrates the issue prior to the fix:

```rust
#[tokio::test]
async fn error_delay_not_parsed_for_try_again_in_seconds_prior_to_fix() {
    // This reproduces the issue where the delay in natural-language phrasing
    // (“Try again in 35 seconds”) is not parsed; delay remains None.
    let raw_error = r#"{
        "type":"response.failed",
        "sequence_number":3,
        "response":{
            "id":"resp_x",
            "object":"response",
            "created_at":1755041560,
            "status":"failed",
            "background":false,
            "error":{
                "code":"rate_limit_exceeded",
                "message":"Rate limit exceeded. Try again in 35 seconds."
            },
            "usage":null,
            "user":null,
            "metadata":{}
        }
    }"#;

    // Build SSE payload for the failure event
    let sse = format!("event: response.failed\ndata: {raw_error}\n\n");

    // Minimal provider stub for tests
    let provider = ModelProviderInfo {
        name: "test".to_string(),
        base_url: Some("https://test.com".to_string()),
        env_key: Some("TEST_API_KEY".to_string()),
        env_key_instructions: None,
        wire_api: WireApi::Responses,
        query_params: None,
        http_headers: None,
        env_http_headers: None,
        request_max_retries: Some(0),
        stream_max_retries: Some(0),
        stream_idle_timeout_ms: Some(1000),
        requires_openai_auth: false,
    };

    // Use existing helper to run SSE through the parser
    let events = collect_events(&[sse.as_bytes()], provider).await;
    assert_eq!(events.len(), 1);

    match &events[0] {
        // Prior to the fix, delay is None (i.e., not parsed), causing exponential backoff.
        Err(CodexErr::Stream(msg, delay)) => {
            assert!(msg.contains("Try again in 35 seconds"));
            assert_eq!(*delay, None, "bug: delay was not parsed from message prior to fix");
        }
        other => panic!("unexpected event: {other:?}"),
    }
}
```


### What is the expected behavior?

- When an error message contains a retry hint in common natural-language formats, the agent should parse it (seconds/minutes/hours/ms) and sleep for that duration before retrying.

- Only fall back to exponential backoff when no usable delay can be parsed.


### What do you see instead?

For non-OpenAI phrasing (“Try again in …”), the delay is not parsed and the agent immediately uses exponential backoff (backoff(retries)). This can cause premature retries or overly long delays, degrading UX and reliability under rate-limited or transient-error conditions.


### Additional information

- Affected areas:

  - core/src/client.rs: previously only parsed “Please try again in Ns/ms” via rate_limit_regex in try_parse_retry_after(err).
  - core/src/codex.rs: run_turn retry path used exponential backoff when CodexErr::Stream carried delay=None.

- Impact:

  - Retrying too soon can trigger additional failures (e.g., rate limits).
  - Retrying too late wastes time and makes the UI feel slow or frozen.

- Tests:
  - Add unit tests covering “seconds”, “minutes” (including decimals), “hours”, “ms”, and the existing OpenAI decimal seconds format.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream retry does not honor “Try again in x seconds” delays; falls back to exponential backoff #4161

What version of Codex is running?

Which model were you using?

What platform is your computer?

What steps can reproduce the bug?

What is the expected behavior?

What do you see instead?

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stream retry does not honor “Try again in x seconds” delays; falls back to exponential backoff #4161

Description

What version of Codex is running?

Which model were you using?

What platform is your computer?

What steps can reproduce the bug?

What is the expected behavior?

What do you see instead?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions