Skip to content

fix(proxy): detect SSE error block in HTTP 200 response and trigger retry#649

Merged
ding113 merged 1 commit intodevfrom
fix/sse-error-detection-retry
Jan 23, 2026
Merged

fix(proxy): detect SSE error block in HTTP 200 response and trigger retry#649
ding113 merged 1 commit intodevfrom
fix/sse-error-detection-retry

Conversation

@ding113
Copy link
Owner

@ding113 ding113 commented Jan 23, 2026

Summary

  • Detect SSE error block when upstream returns HTTP 200 + text/event-stream but first event is error
  • Trigger retry logic and record failure in circuit breaker for such cases
  • Return proper 502 status code to client instead of treating as success

Problem

When upstream returns HTTP 200 with SSE content like:

event: error
data: {"error":{"code":"1302","message":"High concurrency..."}}
data: [DONE]

The system incorrectly:

  • Treated it as successful request
  • Did not trigger retry
  • Did not count in circuit breaker
  • Logged as success with empty tokens

Related Issues:

Solution

  1. New error type: SSEErrorResponseError in errors.ts
  2. Detection function: detectSSEFirstBlockError() in sse.ts
  3. Integration: Use response.clone() to check first SSE block without consuming original stream
  4. Error handling: Added support in error-handler.ts for proper 502 response

Changes

Core Changes

  • src/lib/utils/sse.ts: Added detectSSEFirstBlockError() function to parse and detect error events in SSE streams
  • src/app/v1/_lib/proxy/errors.ts: Added SSEErrorResponseError class and categorization as PROVIDER_ERROR
  • src/app/v1/_lib/proxy/forwarder.ts: Integrated SSE error detection in streaming response path with 5s timeout protection
  • src/app/v1/_lib/proxy/error-handler.ts: Added handler for SSEErrorResponseError returning 502 status

Testing

  • tests/unit/proxy/sse-error-detection.test.ts: Comprehensive test suite with 27 test cases covering various SSE error formats

Test Coverage

  • Unit tests for detectSSEFirstBlockError() - 16 test cases
  • Unit tests for SSEErrorResponseError class - 6 test cases
  • Unit tests for error categorization - 5 test cases
  • bun run typecheck passes
  • bun run build passes

Description enhanced by Claude Code

…etry

When upstream returns HTTP 200 + text/event-stream but the first SSE
event is an error block, the system now correctly:
- Detects the error using response.clone() to preserve the original stream
- Throws SSEErrorResponseError which is classified as PROVIDER_ERROR
- Triggers retry logic and records failure in circuit breaker
- Returns proper 502 status code to client

This fixes the issue where high-concurrency errors like:
  event: error
  data: {"error":{"code":"1302","message":"High concurrency..."}}
were incorrectly treated as successful requests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where the proxy incorrectly treated upstream Server-Sent Event (SSE) responses as successful, even when the initial event in the stream indicated an error. The changes introduce a specialized error type and detection mechanism to identify these 'soft' errors. By doing so, the system can now correctly classify these situations as upstream provider failures, enabling proper retry mechanisms, circuit breaker activation, and returning an accurate 502 HTTP status to the client, thereby improving the reliability and fault tolerance of the proxy.

Highlights

  • New Error Type for SSE: Introduced SSEErrorResponseError to specifically represent cases where an upstream service returns an HTTP 200 status but the initial Server-Sent Event (SSE) stream contains an error block.
  • SSE Error Detection Logic: Implemented detectSSEFirstBlockError to parse the first chunk of an SSE stream and identify common error patterns, such as event: error or JSON data blocks containing an error object.
  • Proxy Integration and Handling: Integrated the SSE error detection into the proxy forwarder. The system now clones the response to safely inspect the initial SSE chunk without consuming the original stream. If an error is detected, it throws the new SSEErrorResponseError.
  • Enhanced Error Handling and Retry: Updated the error handler to recognize SSEErrorResponseError, ensuring that these specific SSE errors are correctly categorized as PROVIDER_ERROR, trigger retry logic, record failures in the circuit breaker, and return a proper 502 status code to the client.
  • Comprehensive Unit Tests: Added extensive unit tests for the new detectSSEFirstBlockError function (16 cases), SSEErrorResponseError class (6 cases), and error categorization (5 cases) to ensure robust error detection and handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 23, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

方案演进

本次变更为SSE流处理添加了首块错误检测机制。在上游返回HTTP 200但SSE流包含错误事件的场景中,系统现可识别并处理此类错误,通过新增SSEErrorResponseError类表示,并在代理转发和错误处理流程中集成相关逻辑。新增测试覆盖检测、分类和序列化行为。

变更清单

队列 / 文件 变更概要
SSE错误检测工具
src/lib/utils/sse.ts
新增SSEFirstBlockError接口和detectSSEFirstBlockError函数,用于分析首个SSE事件并检测显式event: error块或嵌入式错误对象,返回结构化错误或null
错误定义与分类
src/app/v1/_lib/proxy/errors.ts
新增SSEErrorResponseError类及其类型守卫isSSEErrorResponseError;更新categorizeErrorAsync将SSEErrorResponseError归类为供应商错误(优先级3.3)
错误处理集成
src/app/v1/_lib/proxy/error-handler.ts, src/app/v1/_lib/proxy/forwarder.ts
error-handler中添加SSEErrorResponseError分支处理,返回502状态码;forwarder中集成首块检测逻辑,读取首个块并调用detectSSEFirstBlockError,错误时抛出SSEErrorResponseError
测试覆盖
tests/unit/proxy/sse-error-detection.test.ts
新增394行单元测试,覆盖detectSSEFirstBlockError的多种SSE格式、边界场景、SSEErrorResponseError行为验证和categorizeErrorAsync集成

代码审查工作量评估

🎯 3 (Moderate) | ⏱️ ~25 分钟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰、简洁地描述了主要变更:检测 HTTP 200 响应中的 SSE 错误块并触发重试。标题与本次更改的核心目标完全相关。
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed PR 描述详细说明了问题、解决方案和所有实现的变更,与代码变动完全相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/sse-error-detection-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added bug Something isn't working area:core area:Error Rule labels Jan 23, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +422 to +427
const firstChunk = await Promise.race([
reader.read(),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error("SSE first chunk read timeout")), 5000)
),
]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reads first network chunk - if SSE error event is split across chunks, detection may fail. Consider reading until first complete SSE event (detect \n\n boundary) or setting a reasonable byte limit (e.g., 4KB) instead of just reader.read() once.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 422:427

Comment:
Only reads first network chunk - if SSE error event is split across chunks, detection may fail. Consider reading until first complete SSE event (detect `\n\n` boundary) or setting a reasonable byte limit (e.g., 4KB) instead of just `reader.read()` once.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions github-actions bot added the size/M Medium PR (< 500 lines) label Jan 23, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR implements SSE error detection for HTTP 200 responses containing error events. The implementation is well-structured with comprehensive test coverage (27 test cases). After thorough analysis through 6 specialized review perspectives, no significant issues were identified.

PR Size: M

  • Lines changed: 597 (597 additions, 0 deletions)
  • Files changed: 5

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Excellent (27 test cases covering edge cases)
  • Code clarity - Good

Key Strengths

  1. Comprehensive Error Detection: The detectSSEFirstBlockError() function handles multiple error formats (explicit event:error, type:error, embedded error objects)

  2. Proper Resource Management: Uses response.clone() to inspect SSE stream without consuming the original, with proper cleanup via reader.cancel()

  3. Excellent Test Coverage: 27 test cases covering:

    • Multiple error formats (Claude, OpenAI, Gemini styles)
    • Edge cases (malformed JSON, empty streams, long messages)
    • Integration with error categorization system
  4. Type Safety: New SSEErrorResponseError class properly integrated into error hierarchy with type guards

  5. Timeout Protection: 5-second timeout on first chunk read prevents indefinite blocking

Technical Implementation Notes

  • The SSE detection runs only for streaming responses (content-type: text/event-stream)
  • Error detection triggers retry logic and circuit breaker recording as intended
  • Proper 502 status code returned to client for SSE errors
  • No breaking changes to existing error handling flow

Automated review by Claude AI

@ding113 ding113 merged this pull request into dev Jan 23, 2026
15 of 19 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Jan 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此拉取请求通过引入强大的机制来检测 HTTP 200 响应中的 SSE 错误块,显著增强了代理的错误处理能力。新的 SSEErrorResponseError 类及其与现有错误处理和熔断器逻辑的集成设计良好。对 detectSSEFirstBlockError() 函数的全面单元测试确保了该功能的正确性和可靠性,涵盖了各种错误格式和边缘情况。总体而言,这是一项经过深思熟虑且执行良好的改进,将提高系统的韧性。

Comment on lines +119 to +176
export function detectSSEFirstBlockError(sseText: string): SSEFirstBlockError | null {
const events = parseSSEData(sseText);

if (events.length === 0) {
return null;
}

const firstEvent = events[0];

// 情况 1:显式的 event: error
if (firstEvent.event === "error") {
const data = firstEvent.data;
if (typeof data === "object" && data !== null) {
const errorObj = (data as Record<string, unknown>).error as
| Record<string, unknown>
| undefined;
return {
errorCode: (errorObj?.code as string | undefined) ?? (errorObj?.type as string | undefined),
errorMessage:
(errorObj?.message as string) ||
((data as Record<string, unknown>).message as string) ||
"Unknown SSE error",
rawData: sseText.slice(0, 500),
};
}
return {
errorMessage: typeof data === "string" ? data : "Unknown SSE error",
rawData: sseText.slice(0, 500),
};
}

// 情况 2:首个 data block 类型为 error(如 Claude 的 type: "error")
if (typeof firstEvent.data === "object" && firstEvent.data !== null) {
const data = firstEvent.data as Record<string, unknown>;

// 2.1: type: "error" 格式(Claude API 错误格式)
if (data.type === "error") {
const errorObj = data.error as Record<string, unknown> | undefined;
return {
errorCode: (errorObj?.type as string | undefined) ?? (data.code as string | undefined),
errorMessage: (errorObj?.message as string) || (data.message as string) || "Unknown error",
rawData: sseText.slice(0, 500),
};
}

// 2.2: 顶层 error 字段(某些服务直接返回 data: {"error": {...}})
if (data.error && typeof data.error === "object") {
const errorObj = data.error as Record<string, unknown>;
return {
errorCode: (errorObj.code as string | undefined) ?? (errorObj.type as string | undefined),
errorMessage: (errorObj.message as string) || "Unknown SSE error",
rawData: sseText.slice(0, 500),
};
}
}

return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

detectSSEFirstBlockError 函数负责解析多种 SSE 错误格式,这导致其逻辑较为复杂且存在重复代码(例如 rawData: sseText.slice(0, 500))。为了提高可读性、可维护性和测试性,建议将不同的错误检测逻辑提取为独立的私有辅助函数。例如,可以有 _extractFromExplicitErrorEvent(data)_extractFromClaudeStyleError(data)_extractFromTopLevelErrorField(data) 等函数,每个函数专注于处理一种特定的错误结构。

这将使主函数更简洁,并允许对每种错误解析逻辑进行更精细的单元测试。

export function detectSSEFirstBlockError(sseText: string): SSEFirstBlockError | null {
  const events = parseSSEData(sseText);

  if (events.length === 0) {
    return null;
  }

  const firstEvent = events[0];
  const rawDataSnippet = sseText.slice(0, 500);

  // Helper to extract error details from a data object
  const extractErrorDetails = (data: Record<string, unknown>): SSEFirstBlockError => {
    const errorObj = data.error as Record<string, unknown> | undefined;
    return {
      errorCode: (errorObj?.code as string | undefined) ?? (errorObj?.type as string | undefined) ?? (data.code as string | undefined),
      errorMessage:
        (errorObj?.message as string) || (data.message as string) || "Unknown SSE error",
      rawData: rawDataSnippet,
    };
  };

  // 情况 1:显式的 event: error
  if (firstEvent.event === "error") {
    const data = firstEvent.data;
    if (typeof data === "object" && data !== null) {
      return extractErrorDetails(data);
    }
    return {
      errorMessage: typeof data === "string" ? data : "Unknown SSE error",
      rawData: rawDataSnippet,
    };
  }

  // 情况 2:首个 data block 类型为 error(如 Claude 的 type: "error")
  if (typeof firstEvent.data === "object" && firstEvent.data !== null) {
    const data = firstEvent.data as Record<string, unknown>;

    // 2.1: type: "error" 格式(Claude API 错误格式)
    if (data.type === "error") {
      return extractErrorDetails(data);
    }

    // 2.2: 顶层 error 字段(某些服务直接返回 data: {"error": {...}})
    if (data.error && typeof data.error === "object") {
      return extractErrorDetails(data);
    }
  }

  return null;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:Error Rule bug Something isn't working size/M Medium PR (< 500 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant

Comments