Skip to content

Bug fix#179

Merged
ding113 merged 13 commits intomainfrom
dev
Nov 22, 2025
Merged

Bug fix#179
ding113 merged 13 commits intomainfrom
dev

Conversation

@ding113
Copy link
Owner

@ding113 ding113 commented Nov 22, 2025

No description provided.

Silentely and others added 6 commits November 22, 2025 07:04
添加了 IF NOT EXISTS 以防止重复创建报错
- 从 useEffect 依赖数组移除 scope 和 period,避免状态更新触发循环
- 添加 eslint-disable 注释说明原因
- 修复由 PR #168 (commit 281ed80) 引入的 bug
fix:在数据库存在字段时循环报错
…itching

fix: 修复排行榜 Tab 切换无限循环问题, close #177
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a mix of bug fixes and performance enhancements across the application. It includes a crucial database schema correction for daily cost limits, a frontend fix to prevent infinite re-renders in the leaderboard component, and a significant improvement to the proxy's response handling for Gemini models. The proxy enhancement ensures that usage and cost metrics are accurately tracked for passthrough requests without introducing latency for the end-user.

Highlights

  • Database Schema Correction: Corrected the daily_reset_mode column type in the keys and providers tables from varchar to a proper PostgreSQL ENUM type within the Drizzle migration, ensuring data integrity and consistency.
  • Frontend Performance Fix: Resolved a potential infinite re-render loop in the LeaderboardView component by optimizing the useEffect hook's dependency array, removing scope and period to prevent unnecessary updates.
  • Asynchronous Gemini Passthrough Statistics: Implemented asynchronous processing for usage and cost tracking of Gemini passthrough responses (both streaming and non-streaming) in the proxy. This allows the original response to be sent immediately to the client while statistics are gathered in the background, improving performance without sacrificing data accuracy.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次 PR 主要修复了 Gemini API 在透传(passthrough)模式下未追踪成本、Token 用量等统计信息的 Bug。修复方案通过引入后台任务来处理响应并记录统计数据,同时不阻塞客户端,覆盖了流式和非流式两种场景,这是一个很好的改进。此外,SQL 迁移脚本也得到了增强,通过使用 IF NOT EXISTS 和将 daily_reset_mode 列转换为 ENUM 类型,提升了脚本的健壮性和数据完整性。

然而,response-handler.ts 中的修改引入了大量重复的统计逻辑代码。我建议将其重构为一个共享的辅助方法以提高代码的可维护性。另外,我发现在流式透传的处理逻辑中,缺少了存储响应体的调用。在 SQL 迁移脚本中,我也建议使用 time 数据类型代替 varchar 来存储 daily_reset_time,以进一步增强数据完整性。

ALTER TABLE "providers" ADD COLUMN "daily_reset_time" varchar(5) DEFAULT '00:00';--> statement-breakpoint
ALTER TABLE "providers" ADD COLUMN "daily_reset_mode" varchar(10) DEFAULT 'fixed' NOT NULL;--> statement-breakpoint
ALTER TABLE "keys" ADD COLUMN IF NOT EXISTS "limit_daily_usd" numeric(10, 2);--> statement-breakpoint
ALTER TABLE "keys" ADD COLUMN IF NOT EXISTS "daily_reset_time" varchar(5) DEFAULT '00:00';--> statement-breakpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

为了增强数据完整性,建议将 daily_reset_time 列的类型从 varchar(5) 更改为 time(0) without time zone。使用 time 类型可以确保该列只存储有效的时间格式(如 '00:00'),并能利用数据库内置的时间函数进行操作和校验,防止存入 'abcde' 等无效数据。

ALTER TABLE "keys" ADD COLUMN IF NOT EXISTS "daily_reset_time" time(0) without time zone DEFAULT '00:00';--> statement-breakpoint

ALTER TABLE "keys" ADD COLUMN IF NOT EXISTS "daily_reset_mode" "daily_reset_mode" DEFAULT 'fixed' NOT NULL;--> statement-breakpoint

ALTER TABLE "providers" ADD COLUMN IF NOT EXISTS "limit_daily_usd" numeric(10, 2);--> statement-breakpoint
ALTER TABLE "providers" ADD COLUMN IF NOT EXISTS "daily_reset_time" varchar(5) DEFAULT '00:00';--> statement-breakpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

同上,为了增强数据完整性,建议将 daily_reset_time 列的类型从 varchar(5) 更改为 time(0) without time zone。这可以保证数据格式的正确性,并利用数据库对时间类型的原生支持。

ALTER TABLE "providers" ADD COLUMN IF NOT EXISTS "daily_reset_time" time(0) without time zone DEFAULT '00:00';--> statement-breakpoint

Comment on lines 107 to 172
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这部分统计逻辑(计算耗时、更新成本、追踪 tokens 等)在 handleNonStream 的 Gemini 透传、handleStream 的 Gemini 透传以及非 Gemini 场景下存在大量重复。这种重复使得代码难以维护,未来对统计逻辑的任何修改都需要在多个地方同步更新。

建议将这块通用逻辑提取到一个私有的静态辅助方法中,例如 _finalizeRequestStats

private static async _finalizeRequestStats(
  session: ProxySession,
  responseText: string,
  statusCode: number,
  duration: number
): Promise<void> {
  const { messageContext, provider } = session;
  if (!provider || !messageContext) {
    return;
  }

  // 结束请求状态追踪
  ProxyStatusTracker.getInstance().endRequest(messageContext.user.id, messageContext.id);
  // 更新请求时长
  await updateMessageRequestDuration(messageContext.id, duration);

  const { usageMetrics } = parseUsageFromResponseText(
    responseText,
    provider.providerType
  );

  if (!usageMetrics) {
    // 即使没有 usageMetrics,也需要更新状态码和 provider chain
    await updateMessageRequestDetails(messageContext.id, {
      statusCode: statusCode,
      providerChain: session.getProviderChain(),
    });
    return;
  }

  // 更新成本和 tokens
  await updateRequestCostFromUsage(
    messageContext.id,
    session.getOriginalModel(),
    session.getCurrentModel(),
    usageMetrics,
    provider.costMultiplier
  );

  await trackCostToRedis(session, usageMetrics);

  // 更新 session usage
  if (session.sessionId) {
    let costUsdStr: string | undefined;
    if (session.request.model) {
      const priceData = await findLatestPriceByModel(session.request.model);
      if (priceData?.priceData) {
        const cost = calculateRequestCost(
          usageMetrics,
          priceData.priceData,
          provider.costMultiplier
        );
        if (cost.gt(0)) {
          costUsdStr = cost.toString();
        }
      }
    }

    void SessionManager.updateSessionUsage(session.sessionId, {
      inputTokens: usageMetrics.input_tokens,
      outputTokens: usageMetrics.output_tokens,
      cacheCreationInputTokens: usageMetrics.cache_creation_input_tokens,
      cacheReadInputTokens: usageMetrics.cache_read_input_tokens,
      costUsd: costUsdStr,
      status: statusCode >= 200 && statusCode < 300 ? "completed" : "error",
      statusCode: statusCode,
    }).catch((error: unknown) => {
      logger.error("[ResponseHandler] Failed to update session usage:", error);
    });
  }

  // 更新请求详情
  await updateMessageRequestDetails(messageContext.id, {
    statusCode: statusCode,
    inputTokens: usageMetrics?.input_tokens,
    outputTokens: usageMetrics?.output_tokens,
    cacheCreationInputTokens: usageMetrics?.cache_creation_input_tokens,
    cacheReadInputTokens: usageMetrics?.cache_read_input_tokens,
    providerChain: session.getProviderChain(),
  });
}

这样,各个处理分支就可以简化为调用此方法,从而消除重复代码,提高可维护性。


const flushed = decoder.decode();
if (flushed) chunks.push(flushed);
const allContent = chunks.join("");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在流式透传模式下,统计任务中缺少了对完整响应体 allContent 的存储。为了与其他处理逻辑保持一致并支持后续的日志追溯,建议在此处添加 SessionManager.storeSessionResponse 调用,将完整的响应内容存入 Redis。

Suggested change
const allContent = chunks.join("");
const allContent = chunks.join("");
// 存储响应体到 Redis(5分钟过期)
if (session.sessionId) {
void SessionManager.storeSessionResponse(session.sessionId, allContent).catch(
(err) => {
logger.error("[ResponseHandler] Failed to store stream passthrough response:", err);
}
);
}

@ding113 ding113 added the size/M Medium PR (< 500 lines) label Nov 22, 2025
@ding113 ding113 linked an issue Nov 22, 2025 that may be closed by this pull request
@ding113 ding113 added the bug Something isn't working label Nov 22, 2025
@ding113
Copy link
Owner Author

ding113 commented Nov 22, 2025

🔒 Security Scan Results

No security vulnerabilities detected

This PR has been scanned against OWASP Top 10, CWE Top 25, and common security anti-patterns. No security issues were identified in the code changes.

📋 Changes Reviewed

1. Database Migration (drizzle/0021_daily_cost_limits.sql)

  • ✅ Safe DDL operations with IF NOT EXISTS guards
  • ✅ No SQL injection risks (static SQL only)
  • ✅ Proper enum type handling with error recovery
  • ✅ Safe column type conversions with explicit casting

2. Leaderboard Component (leaderboard-view.tsx)

  • ✅ React hooks dependency fix (infinite loop prevention)
  • ✅ No XSS vulnerabilities (React auto-escaping in effect)
  • ✅ Query parameters properly validated on backend
  • ✅ No client-side injection risks

3. Response Handler (response-handler.ts)

  • ✅ Response cloning for async statistics (proper pattern)
  • ✅ Error handling without information disclosure
  • ✅ No unsafe deserialization
  • ✅ Proper async task cleanup

4. API Security (/api/leaderboard)

  • ✅ Authentication required (getSession())
  • ✅ Role-based access control (admin vs. user)
  • ✅ Input validation (period, scope parameters)
  • ✅ Permission checks (allowGlobalUsageView)

🛡️ Scanned Categories

  • A01: Injection - No SQL, NoSQL, Command, or other injection vulnerabilities
  • A02: Broken Authentication - Proper session management and authorization
  • A03: Sensitive Data Exposure - No credentials, secrets, or PII leakage
  • A04: XML External Entities - N/A (no XML parsing)
  • A05: Broken Access Control - Role checks and ownership validation present
  • A06: Security Misconfiguration - No debug info or verbose errors exposed
  • A07: Cross-Site Scripting (XSS) - React auto-escaping, no dangerouslySetInnerHTML
  • A08: Insecure Deserialization - No unsafe JSON.parse() on untrusted data
  • A09: Known Vulnerabilities - No deprecated or vulnerable patterns
  • A10: Logging & Monitoring - Appropriate error logging without data leakage

🔍 Additional Security Checks

  • SSRF Prevention - No user-controlled URLs in network calls
  • Path Traversal - N/A (no file operations)
  • Race Conditions - Async cleanup properly managed
  • Cryptographic Issues - N/A (no crypto operations)
  • Information Disclosure - Errors logged server-side, generic messages to client

✨ Security Best Practices Observed

  1. Defense in Depth: Multiple layers of validation (client + server)
  2. Principle of Least Privilege: Role-based access controls enforced
  3. Fail Secure: Error handlers return generic messages, log details server-side
  4. Input Validation: All query parameters validated with allowlist approach
  5. Secure Defaults: Database constraints and type safety

📊 Security Posture

✅ STRONG - This PR introduces bug fixes without introducing security regressions. The codebase demonstrates solid security practices including authentication, authorization, input validation, and secure error handling.


🤖 Automated security scan by Claude AI - OWASP Top 10 & CWE coverage completed
📅 Scan Date: 2025-11-22
🔬 Scanned Files: 3 | Lines Analyzed: ~500

Copy link
Owner Author

@ding113 ding113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Documentation Review

📊 Issues Summary

  • Critical (🔴): 1 - Must fix before merge
  • High (🟠): 0
  • Medium (🟡): 0
  • Low (🟢): 0

⚡ Priority Fixes

  1. SQL Migration File Comment: The comment header in drizzle/0021_daily_cost_limits.sql is misleading/outdated after the changes in this PR

📋 Review Coverage

  • Technical accuracy - 1 issue
  • Completeness - No issues
  • Code examples - No issues (no code examples in docs)
  • Links and references - No issues
  • Clarity and organization - No issues

💡 General Observations

The CHANGELOG.md entry is correct and references the right PR (#178). The SQL migration changes look technically sound, but the comment needs updating to reflect the new structure.


🤖 Automated docs review by Claude AI

@ding113
Copy link
Owner Author

ding113 commented Nov 22, 2025

🔒 Security Scan Results

No security vulnerabilities detected

This PR has been scanned against OWASP Top 10, CWE Top 25, and common security anti-patterns. No security issues were identified in the code changes.

📋 Files Analyzed

File Security Status
CHANGELOG.md ✅ Documentation only
drizzle/0021_daily_cost_limits.sql ✅ Safe DDL operations
src/app/[locale]/dashboard/leaderboard/_components/leaderboard-view.tsx ✅ UI state fix only
src/app/v1/_lib/proxy/response-handler.ts ✅ Secure refactoring

📋 OWASP Top 10 Coverage

  • A01: Injection - Clean: No SQL injection, no command injection. Database migration uses safe DDL statements with IF NOT EXISTS guards.
  • A02: Broken Authentication - N/A: No authentication changes in this PR.
  • A03: Sensitive Data Exposure - Clean: No sensitive data logging or exposure changes.
  • A04: XML External Entities - N/A: No XML parsing involved.
  • A05: Broken Access Control - Clean: Response handler changes preserve existing authorization patterns.
  • A06: Security Misconfiguration - Clean: Migration uses proper enum type creation with exception handling.
  • A07: XSS - Clean: No user input rendered to DOM without sanitization.
  • A08: Insecure Deserialization - Clean: JSON parsing uses existing patterns with type validation.
  • A09: Known Vulnerabilities - N/A: No dependency changes.
  • A10: Logging & Monitoring - Clean: Proper logging patterns maintained.

🔍 Additional Security Checks

  • SSRF - N/A: No new URL handling or external requests.
  • Path Traversal - N/A: No file path operations modified.
  • Race Conditions - Clean: Uses AsyncTaskManager for proper async coordination.
  • Cryptographic Issues - N/A: No cryptographic operations modified.
  • Input Validation - Clean: Leaderboard scope and period are validated against fixed enum values.

🔎 Detailed Analysis

Database Migration (0021_daily_cost_limits.sql)

  • Uses DO $$ ... EXCEPTION ... END $$ for safe enum creation
  • All ALTER TABLE statements use IF NOT EXISTS pattern
  • No dynamic SQL or user-controlled inputs

Response Handler Changes

  • response.clone() pattern is secure for parallel stats collection
  • All session data flows through existing validated ProxySession object
  • finalizeRequestStats() helper consolidates existing code without introducing new attack surfaces
  • Error handling preserves secure patterns from original code

Leaderboard Fix

  • The eslint-disable-next-line react-hooks/exhaustive-deps is a legitimate fix for preventing infinite re-renders
  • Values are properly initialized and validated before use

🛡️ Security Posture

Strong - This PR maintains the existing security posture of the codebase. The changes are primarily:

  1. A database migration with proper idempotent DDL operations
  2. A React hooks fix for infinite loop prevention
  3. Response handler refactoring for Gemini passthrough support

No new attack surfaces or security regressions were identified.


🤖 Automated security scan by Claude AI - OWASP Top 10 & CWE coverage

- Use HTTP 524 (Timeout) for response/idle timeouts instead of 500
- Use HTTP 502 (Bad Gateway) for upstream stream aborts
- Add detection for upstream abort vs client abort scenarios
- Refactor AbortError detection logic for better clarity
- Ensure valid HTTP status codes (>= 400) are preserved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Owner Author

@ding113 ding113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Resource leak: clearResponseTimeout not called in error path

Why this is a problem: In the Gemini passthrough logic (lines 71-112), clearResponseTimeout() is called inside the try block. If responseForStats.text() fails or any error occurs before line 101, the timeout is never cleared, causing a resource leak and potential memory issues.

Suggested fix:

const statsPromise = (async () => {
  try {
    const responseText = await responseForStats.text();
    
    if (session.sessionId) {
      void SessionManager.storeSessionResponse(session.sessionId, responseText).catch(
        (err) => logger.error("[ResponseHandler] Failed to store response:", err)
      );
    }
    
    const duration = Date.now() - session.startTime;
    await finalizeRequestStats(session, responseText, statusCode, duration);
  } catch (error) {
    if (!isClientAbortError(error as Error)) {
      logger.error("[ResponseHandler] Gemini non-stream passthrough stats task failed:", error);
    }
  } finally {
    // Move cleanup to finally block to ensure it always runs
    const sessionWithCleanup = session as typeof session & {
      clearResponseTimeout?: () => void;
    };
    if (sessionWithCleanup.clearResponseTimeout) {
      sessionWithCleanup.clearResponseTimeout();
    }
    AsyncTaskManager.cleanup(taskId);
  }
})();

Copy link
Owner Author

@ding113 ding113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Semantically incorrect status code for upstream abort

Why this is a problem: At line 959, when detecting upstream abort (!clientAborted), the code uses status 524 (Cloudflare-specific "A Timeout Occurred"). However, this scenario represents an upstream server closing the connection unexpectedly, not a timeout. The correct HTTP status code should be 502 (Bad Gateway) for upstream failures.

Why this matters:

  • Status 524 is Cloudflare-specific and not a standard HTTP code
  • It semantically indicates a timeout, not an upstream abort
  • Using 502 provides better clarity for monitoring and debugging

Suggested fix:

} else if (!clientAborted) {
  // 上游在流式过程中意外中断:视为供应商/网络错误
  logger.error("ResponseHandler: Upstream stream aborted unexpectedly", {
    taskId,
    providerId: provider.id,
    providerName: provider.name,
    messageId: messageContext.id,
    chunksCollected: chunks.length,
    errorName: err.name,
    errorMessage: err.message || "(empty message)",
  });

  await persistRequestFailure({
    session,
    messageContext,
    statusCode: statusCode && statusCode >= 400 ? statusCode : 502,  // Changed from 524 to 502
    error: err,
    taskId,
    phase: "stream",
  });
}

Copy link
Owner Author

@ding113 ding113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 SQL migration may fail with invalid existing data

Why this is a problem: The type conversion at lines 36-43 uses USING clause to cast varchar to enum, but assumes all existing values are valid ('fixed' or 'rolling'). If any row has an invalid value (e.g., from manual insertion or corruption), the migration will fail with a cryptic error, potentially blocking deployments.

Suggested fix:

-- Step 3.0: Validate existing data before conversion
DO $$ 
DECLARE
  invalid_count INTEGER;
BEGIN
  SELECT COUNT(*) INTO invalid_count
  FROM "keys"
  WHERE "daily_reset_mode" NOT IN ('fixed', 'rolling');
  
  IF invalid_count > 0 THEN
    RAISE EXCEPTION 'Found % invalid daily_reset_mode values in keys table. Valid values are: fixed, rolling', invalid_count;
  END IF;
  
  SELECT COUNT(*) INTO invalid_count
  FROM "providers"
  WHERE "daily_reset_mode" NOT IN ('fixed', 'rolling');
  
  IF invalid_count > 0 THEN
    RAISE EXCEPTION 'Found % invalid daily_reset_mode values in providers table. Valid values are: fixed, rolling', invalid_count;
  END IF;
END $$;
--> statement-breakpoint

-- Step 3.1: Safe type conversion (only if validation passed)
ALTER TABLE "keys" 
  ALTER COLUMN "daily_reset_mode" TYPE "daily_reset_mode" 
  USING "daily_reset_mode"::"daily_reset_mode";
--> statement-breakpoint

ALTER TABLE "providers" 
  ALTER COLUMN "daily_reset_mode" TYPE "daily_reset_mode" 
  USING "daily_reset_mode"::"daily_reset_mode";

@ding113
Copy link
Owner Author

ding113 commented Nov 22, 2025

📝 Documentation Review

Documentation looks good

No issues found. The CHANGELOG.md update is:

The changelog entry accurately documents the infinite loop fix in leaderboard tab switching.


🤖 Automated docs review by Claude Code

Copy link
Owner Author

@ding113 ding113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Insufficient error differentiation in abort detection

Why this is a problem: The new upstream abort detection (lines 877-968) catches all AbortErrors when !clientAborted, but doesn't differentiate between:

  1. Upstream server closing connection unexpectedly
  2. Network errors (DNS failure, connection refused, etc.)
  3. Other abort sources

This makes debugging and monitoring harder, as all these different failure modes are logged identically as "Upstream stream aborted unexpectedly".

Suggested fix:

const sessionWithController = session as typeof session & {
  responseController?: AbortController;
};
const clientAborted = session.clientAbortSignal?.aborted ?? false;
const isResponseControllerAborted =
  sessionWithController.responseController?.signal.aborted ?? false;

if (isClientAbortError(err)) {
  const isResponseTimeout = isResponseControllerAborted && !clientAborted;
  const isIdleTimeout = err.message?.includes("streaming_idle");

  if (isResponseTimeout && !isIdleTimeout) {
    // ... existing timeout handling ...
  } else if (isIdleTimeout) {
    // ... existing idle timeout handling ...
  } else if (!clientAborted) {
    // More granular error detection
    const isNetworkError = err.message?.includes('fetch failed') || 
                           err.message?.includes('ECONNREFUSED') ||
                           err.message?.includes('ENOTFOUND');
    const isUpstreamClose = err.name === 'AbortError' && !isResponseControllerAborted;
    
    if (isNetworkError) {
      logger.error("ResponseHandler: Network error during streaming", {
        taskId, providerId: provider.id, messageId: messageContext.id,
        errorMessage: err.message, errorType: "network"
      });
    } else if (isUpstreamClose) {
      logger.error("ResponseHandler: Upstream closed connection", {
        taskId, providerId: provider.id, messageId: messageContext.id,
        errorType: "upstream_close"
      });
    } else {
      logger.error("ResponseHandler: Unknown abort reason", {
        taskId, providerId: provider.id, messageId: messageContext.id,
        errorName: err.name, errorMessage: err.message, errorType: "unknown"
      });
    }

    await persistRequestFailure({
      session, messageContext,
      statusCode: 502,
      error: err, taskId, phase: "stream",
    });
  }
}

Copy link
Owner Author

@ding113 ding113 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Code Review Summary

This PR fixes an infinite loop in leaderboard tab switching (#178) and improves the database migration for daily cost limits. The leaderboard fix is correct, but the response handler changes introduce potential resource leaks and use semantically incorrect HTTP status codes.

🔍 Issues Found

  • Critical (🔴): 0 issues
  • High (🟠): 1 issue
  • Medium (🟡): 3 issues
  • Low (🟢): 0 issues

🎯 Priority Actions

  1. Fix resource leak in Gemini passthrough (🟠 High): Move clearResponseTimeout() to finally block to ensure cleanup happens even when errors occur (lines 71-112 and similar pattern in stream handler)

  2. Correct HTTP status code for upstream aborts (🟡 Medium): Change status 524 to 502 for upstream connection failures at line 959 - 524 is Cloudflare-specific and semantically incorrect

  3. Add SQL migration validation (🟡 Medium): Validate existing data before enum type conversion to prevent migration failures from invalid varchar values

  4. Improve error differentiation (🟡 Medium): Add more granular detection for network errors vs upstream closes vs other abort types for better monitoring

💡 General Observations

Positive changes:

  • Leaderboard infinite loop fix is clean and well-documented
  • SQL migration improvements with IF NOT EXISTS clauses increase idempotency
  • Enum type creation with exception handling is a good defensive pattern

Areas of concern:

  • The Gemini passthrough implementation has the same resource leak pattern in both stream and non-stream paths
  • Status code inconsistencies across error paths (524 vs 502 vs 500) make monitoring and debugging harder
  • Error handling consolidation is good, but loses some granularity in distinguishing failure modes

🤖 Automated review by Claude AI - focused on identifying issues for improvement

ding113 and others added 3 commits November 22, 2025 15:25
- Changed status code from 524 (Cloudflare-specific) to 502 (RFC 7231) for three scenarios:
  1. Non-stream response timeout during body read (line 383)
  2. Stream response timeout during body read (line 926)
  3. Streaming idle timeout (line 962)

- Rationale:
  * 524 is Cloudflare-specific and semantically incorrect (indicates timeout waiting for response headers)
  * 502 Bad Gateway correctly represents incomplete/interrupted upstream responses
  * Aligns with existing code at line 982 which already uses 502 for upstream abort
  * Improves HTTP RFC compliance and monitoring clarity

- Impact: Better monitoring, clearer error semantics, consistent with standard HTTP codes
- 移除用户列表高度限制,支持完整显示所有用户
- 将新增用户按钮移至列表顶部,提升可见性
- 添加精心设计的空状态引导,帮助新用户快速上手
- 完善 5 种语言的国际化支持(en, ja, ru, zh-CN, zh-TW)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ding113 ding113 merged commit 1c173e7 into main Nov 22, 2025
ding113 pushed a commit that referenced this pull request Nov 22, 2025
@ding113 ding113 mentioned this pull request Nov 25, 2025
2 tasks
@ding113 ding113 mentioned this pull request Nov 28, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size/M Medium PR (< 500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

排行榜无法切换 Tab

2 participants

Comments