release by ding113 · Pull Request #216 · ding113/claude-code-hub

ding113 · 2025-11-27T03:29:19Z

摘要

本次发布引入了全面的供应商可用性监控系统、智能断路器探测以加快供应商恢复，以及增强的供应商测试功能，支持三层验证。

问题

无法实时查看供应商健康状况和可用性趋势
断路器处于 OPEN 状态时需要人工干预或固定等待时间才能恢复
供应商测试缺乏可配置的验证和用于中继服务验证的预设模板
供应商页面存在 N+1 查询问题和 SQL 全表扫描，导致性能下降

解决方案

🆕 新功能

供应商可用性监控 (/dashboard/availability)

实时仪表盘显示供应商健康状态（绿色/红色/未知）
可用性指标：成功率、平均/P50/P95/P99 延迟、请求数
时间范围筛选（15分钟、1小时、6小时、24小时、7天）
热力图可视化，桶大小可配置
自动刷新功能

智能断路器探测

断路器处于 OPEN 状态时定期进行健康探测
可通过环境变量进行配置：
- ENABLE_SMART_PROBING（默认：false）
- PROBE_INTERVAL_MS（默认：30000 毫秒）
- PROBE_TIMEOUT_MS（默认：5000 毫秒）
成功的探测可提前将断路器切换到 HALF_OPEN 状态

增强的供应商测试

三层验证：HTTP 状态 → 延迟阈值 → 内容关键词匹配
带有真实 CLI 请求模式的预设模板，用于中继服务验证
支持自定义 JSON 载荷，适用于高级测试方案
详细的结果卡片显示验证分解、时间信息和令牌使用情况

🔧 改进

性能：修复供应商页面的 N+1 查询和 SQL 全表扫描问题
错误解析：增强了中继服务的嵌套错误结构支持
流式超时：将空闲超时时间范围从 1-120 秒调整为 60-600 秒（0 表示禁用）

🐛 漏洞修复

修复了供应商每日使用统计 JSONB 字段名错误
修复了供应商测试中响应内容验证失败的问题
修复了登录重定向显示重复语言前缀的问题
修复了 zh-TW 缺少 apiTest 的 8 个翻译键问题
移除 DialogContent 中硬编码的 sm:max-w-lg 宽度限制

📚 文档

新增对 prehisle/relay-pulse 供应商检测的致谢

变更

新增可用性监控模块 (src/lib/availability/)
新增带解析器和验证器的供应商测试库 (src/lib/provider-testing/)
新增智能断路器探测服务 (src/lib/circuit-breaker-probe.ts)
新增可用性仪表盘页面及组件
更新 EN、JA、RU、ZH-CN、ZH-TW 的 i18n 翻译
优化供应商仓库查询

测试

供应商可用性监控仪表盘的手动测试
使用各种预设配置进行供应商测试
验证断路器探测功能
确保现有 API 无重大变更

截图（如适用）

N/A - UI 组件遵循现有设计模式

- 优先提取 upstream_error 中的错误信息 - 解决 cubence 等中转服务的错误显示问题 - 修复前端显示"you appear to be making a request using a non-Claude code cli client, which is prohibited" - 保持向后兼容的错误提取逻辑 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- 减少嵌套层级，提高代码可读性 - 使用可选链操作符简化属性访问 - 保持相同的功能逻辑 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

fix: 增强错误解析以支持中转服务的嵌套错误结构

- 新增供应商可用性监控页面，支持热力图可视化 - 添加排序功能（按可用性/名称/请求数），默认按可用性排序 - 简化状态逻辑：移除"波动"状态，成功请求=绿色，失败请求=红色 - 响应式热力图宽度，自动适配时间跨度 - 支持5种语言的i18n翻译 (zh-CN, en, ja, ru, zh-TW) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

问题分析： - 供应商限额页面：每个供应商触发 1 DB + 5 Redis 查询（N+1 问题） - 供应商管理页面：getProviderStatistics SQL 缺少日期过滤（全表扫描）修复内容： 1. 新增批量查询方法避免 N+1： - RateLimitService.getCurrentCostBatch() - Redis Pipeline 批量获取限额 - SessionTracker.getProviderSessionCountBatch() - 批量获取 session 计数 - getProviderLimitUsageBatch() - 整合批量查询入口 2. 优化 getProviderStatistics SQL： - provider_stats CTE: LEFT JOIN 添加 created_at >= CURRENT_DATE 过滤 - latest_call CTE: 添加 7 天时间范围限制性能提升（50 个供应商）： - 限额页面：52 DB + 250 Redis → 2 DB + 2 Redis Pipeline - 管理页面：全表扫描 → 仅扫描今日/7 天数据 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

基于 relay-pulse 实现改进内容提取逻辑： - 改进 isSSEResponse 检测：使用 event: + data: 双重特征判断 - 新增 aggregateResponseText 函数：提供 SSE → JSON → 原始响应的回退机制 - 修改 test-service 使用 aggregateResponseText 进行内容验证修复场景：即使 SSE 解析返回空内容，也能通过 JSON 或原始响应回退提取关键词 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

SQL 查询中使用了错误的键名 'providerId'，而 ProviderChainItem 实际定义的是 'id'。当请求有重试切换时，JSONB 查询返回 NULL，导致今日用量显示为 0。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

将供应商可用性监控的状态标签从 Badge variant 改为自定义 className，采用与请求日志表格相同的浅色背景配色方案，提升 UI 一致性。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

前端提示说"范围 1-120 秒，默认 10 秒"，但后端校验规则实际为 60-600 秒。更新所有语言的 streamingIdle 描述为正确范围，并说明填 0 可禁用。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

使用 pathWithoutLocale 替代 pathname 作为 from 参数，避免 locale-aware router 再次添加 locale 前缀导致 /zh-CN/zh-CN/dashboard 这样的错误 URL。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

核心改动： - 移除复杂的 SSE/JSON 解析逻辑，直接用 responseBody.includes() 匹配关键词 - 新增 rawResponse 字段（5000字符），提供完整响应体供用户检查 - 前端用 Raw Response Body 替代 Stream Info，方便用户自行判断问题 - 添加 5 种语言的 rawResponse 翻译优点：无论响应格式如何（SSE/JSON/纯文本），只要包含关键词就能通过验证 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

添加供应商测试功能相关的繁体中文翻译： - streamInfo, chunksReceived, streamFormat, streamResponse - chunksCount, truncatedPreview, truncatedBrief, copyFormat 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

供应商检测功能参考了 prehisle/relay-pulse 项目的实现。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

允许调用方通过 className 完全控制弹窗宽度，修复编辑服务商等弹窗无法使用 max-w-6xl 的问题。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

gemini-code-assist · 2025-11-27T03:30:28Z

Summary of Changes

Hello @ding113, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a comprehensive upgrade to provider management and monitoring capabilities. It introduces a dedicated dashboard for real-time provider availability, streamlines API testing with a unified and robust validation framework, and enhances system resilience through smart probing for circuit breakers. Additionally, it optimizes backend data retrieval for improved performance and ensures all new features are fully localized.

Highlights

Provider Availability Monitoring: A new 'Provider Availability Monitor' dashboard has been introduced, offering real-time tracking of provider performance and availability metrics through a visual heatmap. This feature is accessible to administrators.
Unified Provider Testing: The API testing functionality has been unified under a new testProviderUnified action, implementing a robust three-tier validation system (HTTP status, latency, and content matching) inspired by the relay-pulse project. This includes support for custom payloads and predefined test presets.
Smart Probing for Circuit Breakers: New configuration options enable 'smart probing' for circuit breakers. When a provider's circuit is in an 'OPEN' state, the system will periodically probe it. A successful probe can transition the circuit to 'HALF_OPEN' faster, improving recovery times.
Optimized Data Fetching: Performance for fetching provider quota usage and active session counts has been significantly improved by implementing batch queries using Redis pipelines, effectively addressing N+1 query issues.
Internationalization Updates: Extensive new translation keys have been added across multiple languages (English, Japanese, Russian, Simplified Chinese, Traditional Chinese) to support the new availability monitoring and unified provider testing features.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ding113 · 2025-11-27T03:31:12Z

📊 PR Size Analysis

This PR is XL with 6,920 lines changed across 51 files.

Large PRs are harder to review and more likely to introduce bugs.

🔀 Suggested Split:

Based on the changes, this PR could be split into:

PR 1: Provider Testing Framework (~13 files)
- src/lib/provider-testing/* (parsers, validators, presets, types, utils)
- src/actions/providers.ts
- src/app/[locale]/settings/providers/_components/forms/api-test-button.tsx
- src/app/[locale]/settings/providers/_components/forms/test-result-card.tsx
PR 2: Availability Service (~7 files)
- src/lib/availability/*
- src/app/api/availability/*
- src/app/[locale]/dashboard/availability/*
- src/app/[locale]/dashboard/quotas/providers/page.tsx
PR 3: i18n Updates (~12 files)
- messages/*/dashboard.json
- messages/*/settings.json
PR 4: Infrastructure & Misc (~9 files)
- src/lib/circuit-breaker-probe.ts
- src/lib/rate-limit/service.ts
- src/lib/session-tracker.ts
- src/middleware.ts
- src/instrumentation.ts
- src/repository/provider.ts
- Documentation updates

Why Split?

Easier to review each feature independently
Faster CI feedback per change
Easier to revert specific features if needed
Better git history for tracking feature additions
Allows parallel review of different features

🤖 Automated analysis by Claude AI

ding113 · 2025-11-27T03:32:55Z

🔒 Security Scan Results

✅ No security vulnerabilities detected

This PR has been scanned against OWASP Top 10, CWE Top 25, and common security anti-patterns. No security issues were identified in the code changes.

Scanned Categories

✅ Injection attacks (SQL, NoSQL, Command, LDAP, etc.) - The code uses parameterized queries via Drizzle ORM. Raw SQL queries in provider.ts use parameterized placeholders (${timezone}) safely. No string concatenation for SQL queries detected.
✅ Authentication and session management - All server actions properly check admin session via getSession() before executing sensitive operations. Session validation is consistently applied.
✅ Sensitive data exposure - API keys are properly masked in logs using sanitizeErrorTextForLogging(). No hardcoded credentials detected. Sensitive data is properly redacted in error messages.
✅ Access control and authorization - Admin-only routes are properly protected. The new availability monitoring endpoints properly validate user roles.
✅ Security misconfiguration - No debug mode or verbose error exposure detected in production code paths.
✅ Cross-site scripting (XSS) - React components use proper escaping. No dangerouslySetInnerHTML or direct DOM manipulation with user input detected.
✅ Insecure deserialization - JSON parsing includes proper error handling. No unsafe deserialization patterns detected.
✅ SSRF and path traversal - The validateProviderUrlForConnectivity() function properly blocks internal network addresses (localhost, 127.x.x.x, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, etc.) and dangerous ports (22, 23, 25, 3306, 5432, 6379, 27017, 9200).
✅ Cryptographic weaknesses - No weak cryptographic algorithms detected. No predictable token generation.

Security-Positive Patterns Observed

SSRF Prevention (providers.ts:1553-1593)
- Comprehensive blocklist for internal IP ranges
- Protocol validation (HTTP/HTTPS only)
- Dangerous port blocking
Rate Limiting & DoS Protection (providers.ts:73-77)
- Stream buffer size limits (10MB max)
- Chunk count limits (1000 max)
- Iteration limits (10000 max)
Authentication Consistency
- All new server actions (testProviderUnified, getProviderTestPresets, getProviderLimitUsageBatch) properly check admin session
Error Message Sanitization (providers.ts:1033-1053)
- API keys redacted from logs
- Email addresses sanitized
- Bearer tokens removed
- Paths and sensitive config files sanitized
Input Validation
- Zod schemas used for provider creation/updates
- URL validation before making external requests
- JSON payload validation in test service

Files Reviewed

File	Changes	Security Notes
`src/actions/providers.ts`	+393 lines	SSRF protection in place, proper auth checks
`src/lib/provider-testing/test-service.ts`	+271 lines	Clean request handling, no injection risks
`src/lib/rate-limit/service.ts`	+136 lines	Redis pipeline queries, safe batch operations
`src/lib/session-tracker.ts`	+109 lines	Safe Redis operations with proper error handling
`src/repository/provider.ts`	+13/-5 lines	SQL field name fix (providerId→id), uses parameterized queries
`src/middleware.ts`	+2/-2 lines	Fixes double-locale prefix issue
`src/app/api/availability/*.ts`	New routes	Would benefit from admin auth check (see recommendations)

Recommendations (Non-blocking)

API Route Authentication (Low priority)
- The new /api/availability routes should verify that the routes are properly protected by middleware or add explicit admin session checks similar to server actions.
Request Timeout Consistency
- Consider standardizing timeout values across different test scenarios for consistency.

🤖 Automated security scan by Claude AI - OWASP Top 10 & CWE coverage

gemini-code-assist

Code Review

This pull request introduces several major features, including a provider availability monitoring dashboard, a smart probing mechanism for circuit breakers, and a unified provider testing framework. It also includes significant performance optimizations by batching database and Redis queries. My review identified one high-severity issue in the new smart probing logic. A successful probe incorrectly resets the circuit breaker to a fully CLOSED state instead of the safer HALF_OPEN state. This could reduce system stability when dealing with flaky providers.

gemini-code-assist · 2025-11-27T03:33:16Z

src/lib/circuit-breaker-probe.ts

+      // Reset circuit to closed (will transition through half-open on next real request)
+      // Using resetCircuit here since we want to give the provider another chance
+      resetCircuit(providerId);


Resetting the circuit directly to CLOSED upon a single successful probe is too aggressive. It bypasses the HALF_OPEN state, which is a crucial part of the circuit breaker pattern for safely verifying a provider's recovery. A single successful lightweight probe might not guarantee the provider can handle production load, and immediately opening it to full traffic could cause it to fail again, leading to a flapping state where the circuit rapidly opens and closes.

The log message on line 115 ("transitioning to half-open") and the comment on line 122 are also misleading, as the current implementation with resetCircuit sets the state directly to CLOSED.

A better approach would be to transition the circuit to the HALF_OPEN state. This could be done by creating a new function, e.g., tripToHalfOpen(providerId), that correctly sets the circuit state, or by adjusting the circuitOpenUntil timestamp to allow the natural transition to HALF_OPEN on the next request.

ding113

📋 Code Review Summary

This PR adds multiple new features including provider availability monitoring, unified provider testing with three-tier validation, smart circuit breaker probing, and batch operations for provider limits. The implementation is comprehensive but has several issues that should be addressed.

🔍 Issues Found

Critical (🔴): 0 issues
High (🟠): 3 issues
Medium (🟡): 2 issues
Low (🟢): 1 issue

🎯 Priority Actions

See inline comments for specific issues identified:

src/lib/provider-testing/test-service.ts: Missing timeout cleanup in finally block - potential memory leak on fetch abort
src/lib/circuit-breaker-probe.ts: Infinite setInterval without proper cleanup mechanism
src/lib/availability/availability-service.ts: Unbounded data loading - potential memory exhaustion
src/lib/session-tracker.ts: N+1 Redis pipeline pattern with unnecessary second phase
src/app/api/availability/current/route.ts: Inconsistent authentication mechanism

💡 General Observations

The new provider testing infrastructure is well-structured with good separation of concerns
The availability monitoring adds useful observability features
Consider adding rate limiting to the availability API endpoints to prevent abuse
The three-tier validation (HTTP status, latency, content) follows industry best practices

🤖 Automated review by Claude AI - focused on identifying issues for improvement

ding113

📋 Code Review Summary

This PR adds multiple new features including provider availability monitoring, unified provider testing with three-tier validation, smart circuit breaker probing, and batch operations for provider limits. The implementation is comprehensive but has several issues that should be addressed.

🔍 Issues Found

Critical (🔴): 0 issues
High (🟠): 3 issues
Medium (🟡): 2 issues
Low (🟢): 1 issue

🎯 Priority Actions

Memory leak in test-service.ts: Missing timeout cleanup on fetch abort
Resource leak in circuit-breaker-probe.ts: Infinite setInterval without proper cleanup
Performance issue in availability-service.ts: Unbounded data loading could cause memory exhaustion
N+1 pattern in session-tracker.ts: Unnecessary second phase in batch operation
Inconsistent auth in current/route.ts: Uses different auth mechanism than main route

🤖 Automated review by Claude AI - focused on identifying issues for improvement

ding113 · 2025-11-27T03:35:30Z

src/lib/provider-testing/test-service.ts

+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
+
+    // Execute request


🟠 High: Missing timeout cleanup in error path leading to potential memory leak

Why this is a problem: When the fetch request throws an error (abort, network error, etc.), the clearTimeout(timeoutId) on line 77 is never executed because the error is thrown before reaching it. The timeout continues running, potentially causing issues if many requests fail.

Suggested fix:

const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), timeoutMs); try { const response = await fetch(url, { method: "POST", headers, body: JSON.stringify(body), signal: controller.signal, }); // ... success handling } catch (error) { // ... error handling } finally { clearTimeout(timeoutId); }

ding113 · 2025-11-27T03:35:30Z

src/lib/circuit-breaker-probe.ts

+      if (health.circuitState === "open") {
+        openCircuits.push(parseInt(providerId, 10));
+      }
+    }


🟠 High: Infinite setInterval without proper cleanup mechanism on graceful shutdown

Why this is a problem: The probeIntervalId is stored but there's no integration with process shutdown handlers. When the Node.js process receives SIGTERM/SIGINT (common in containerized deployments), the interval keeps running until forceful termination, which could leave probes in an inconsistent state.

Suggested fix:

export function startProbeScheduler(): void { if (!ENABLE_SMART_PROBING) return; if (probeIntervalId) return; logger.info("[SmartProbe] Starting probe scheduler", { intervalMs: PROBE_INTERVAL_MS }); probeIntervalId = setInterval(runProbeCycle, PROBE_INTERVAL_MS); runProbeCycle(); // Register cleanup handlers for graceful shutdown const cleanup = () => { stopProbeScheduler(); process.removeListener('SIGTERM', cleanup); process.removeListener('SIGINT', cleanup); }; process.on('SIGTERM', cleanup); process.on('SIGINT', cleanup); }

ding113 · 2025-11-27T03:35:30Z

src/lib/availability/availability-service.ts

+      endTime: endDate.toISOString(),
+      bucketSizeMinutes: explicitBucketSize ?? 60,
+      providers: [],
+      systemAvailability: 0,


🟠 High: Unbounded query loading all requests into memory could cause OOM

Why this is a problem: This query loads ALL requests within the time range into memory without pagination. For production systems with high traffic, querying 7 days of data could return millions of rows, causing memory exhaustion and application crashes.

Suggested fix:

// Add LIMIT to prevent unbounded loading const MAX_REQUESTS_PER_QUERY = 100000; // Reasonable limit const requests = await db .select({ id: messageRequest.id, providerId: messageRequest.providerId, statusCode: messageRequest.statusCode, durationMs: messageRequest.durationMs, errorMessage: messageRequest.errorMessage, createdAt: messageRequest.createdAt, }) .from(messageRequest) .where(and(...requestConditions)) .orderBy(messageRequest.createdAt) .limit(MAX_REQUESTS_PER_QUERY); // Consider logging a warning if limit is reached if (requests.length === MAX_REQUESTS_PER_QUERY) { logger.warn('Availability query hit max limit, results may be incomplete'); }

Alternatively, consider using SQL aggregation instead of in-memory processing for better performance.

ding113 · 2025-11-27T03:35:30Z

src/lib/session-tracker.ts

+          providerSessionMap.set(providerId, sessionIds);
+          allSessionIds.push(...sessionIds);
+        } else {
+          providerSessionMap.set(providerId, []);


🟡 Medium: Redundant pipeline phase - second EXISTS check is unnecessary

Why this is a problem: The second pipeline verifies session existence by checking session:{sessionId}:info, but the zrange already returned valid session IDs from the sorted set. If the session was removed, zremrangebyscore would have already cleaned it. This adds extra Redis round-trips without benefit.

Suggested fix:

// Skip the second verification phase - just count what zrange returned for (const providerId of providerIds) { const sessionIds = providerSessionMap.get(providerId) || []; result.set(providerId, sessionIds.length); } return result;

If the verification IS needed (e.g., session:info can expire independently), document why in a comment.

ding113 · 2025-11-27T03:35:30Z

src/app/api/availability/current/route.ts

+  // Verify admin authentication
+  const authHeader = request.headers.get("Authorization");
+  const token = authHeader?.replace("Bearer ", "");
+


🟡 Medium: Inconsistent authentication mechanism between availability endpoints

Why this is a problem: This endpoint uses Bearer token + validateKey() while /api/availability/route.ts uses session cookies via getSession(). This inconsistency could confuse API consumers and make authorization harder to maintain.

Suggested fix: Use the same authentication mechanism as the main availability route:

import { getSession } from "@/lib/auth"; export async function GET(request: NextRequest) { const session = await getSession(); if (!session || session.user.role !== "admin") { return NextResponse.json({ error: "Unauthorized" }, { status: 401 }); } // ... }

Or document why Bearer token auth is intentionally used here (e.g., for API access from external services).

ding113 · 2025-11-27T03:35:30Z

src/repository/provider.ts

                  (mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
                  OR
-                  -- 情况2：有重试，使用 providerChain 最后一项的 providerId
+                  -- 情况2：有重试，使用 providerChain 最后一项的 id


🟢 Low: SQL field name change from 'providerId' to 'id' may break existing data

Why this is a problem: The change from provider_chain->-1->>'providerId' to provider_chain->-1->>'id' assumes all provider_chain JSON objects use the 'id' field. If there's existing data with 'providerId', statistics will be incorrect.

Suggested fix: Add COALESCE to handle both field names during migration:

COALESCE( (mr.provider_chain->-1->>'id')::int, (mr.provider_chain->-1->>'providerId')::int ) = p.id

Or confirm via migration that all existing data has been updated.

ding113 · 2025-11-27T03:36:15Z

📝 Documentation Review

📊 Issues Summary

Critical (🔴): 0 - Must fix before merge
High (🟠): 0 - Should fix before merge
Medium (🟡): 1 - Consider addressing
Low (🟢): 0 - Optional improvements

⚡ Priority Fixes

Terminology inconsistency in acknowledgments - Minor wording improvement suggested

📋 Review Coverage

Technical accuracy - 1 issue
Completeness - 0 issues
Code examples - N/A (no code examples in docs changes)
Links and references - 0 issues (all links verified: 200 OK)
Clarity and organization - 0 issues

🔍 Detailed Findings

🟡 Terminology: Description may not accurately reflect functionality

Files: README.md (line 285), README.en.md (line 285)

Problem: The acknowledgments describe relay-pulse as providing "provider detection functionality", but the actual relay-pulse project and the implemented code provide "provider availability monitoring" / "provider testing" functionality with three-tier validation.

Current text (README.en.md):

...and [prehisle/relay-pulse](https://github.com/prehisle/relay-pulse) for provider detection functionality.

Suggested text:

...and [prehisle/relay-pulse](https://github.com/prehisle/relay-pulse) for provider availability monitoring.

Or alternatively:

...and [prehisle/relay-pulse](https://github.com/prehisle/relay-pulse) for provider testing functionality.

Impact: Minor - readers who visit relay-pulse expecting "detection" features may be confused to find availability monitoring instead.

Reference: relay-pulse README describes itself as "LLM 服务可用性监控" (LLM service availability monitoring). The implemented code in this PR (src/lib/provider-testing/, src/lib/availability/) aligns with monitoring/testing, not detection.

✅ Verified Items

External links all return HTTP 200:
Consistent bilingual updates: Both README.md (Chinese) and README.en.md (English) were updated with matching acknowledgments
Localization files: All locale message files (en, ja, ru, zh-CN, zh-TW) contain consistent translations for new features

🤖 Automated docs review by Claude AI

1. circuit-breaker: 添加 tripToHalfOpen 函数，智能探测成功后转到 HALF_OPEN 而非直接 CLOSED - 修复日志 "transitioning to half-open" 与实际行为不一致的问题 - 更安全的恢复机制，允许通过 HALF_OPEN 阶段验证 2. availability-service: 添加 MAX_REQUESTS_PER_QUERY 限制 (100000) - 防止大时间范围查询导致内存溢出 - 达到限制时记录警告日志 3. test-service: 使用 finally 块确保 timeout 清理 - 即使请求抛出异常也会执行 clearTimeout 4. current/route.ts: 统一认证机制 - 从 Bearer token + validateKey 改为 getSession - 与 /api/availability 保持一致 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

ding113 · 2025-11-27T03:47:18Z

📊 PR Size Analysis

This PR is XL with 6,968 lines changed across 52 files (6,555 additions, 413 deletions).

Large PRs are harder to review and more likely to introduce bugs.

🔀 Suggested Split:

Based on the changes, this PR could be split into 5 focused PRs:

1. Provider Availability Monitoring (~2,500 lines)

Core availability feature with dashboard:

src/lib/availability/* (service, types, index)
src/app/api/availability/* (API routes)
src/app/[locale]/dashboard/availability/* (UI components)
Related i18n keys in messages/*/dashboard.json

2. Smart Circuit Breaker Probing (~800 lines)

Circuit breaker enhancements:

src/lib/circuit-breaker-probe.ts
src/lib/circuit-breaker.ts (modifications)
src/instrumentation.ts
.env.example (new env vars)

3. Enhanced Provider Testing (~2,500 lines)

Provider testing library with parsers and validators:

src/lib/provider-testing/* (test-service, types, presets)
src/lib/provider-testing/parsers/*
src/lib/provider-testing/validators/*
src/lib/provider-testing/utils/*
src/lib/provider-testing/data/* (test fixtures)
src/app/[locale]/settings/providers/_components/forms/*
Related i18n keys in messages/*/settings.json

4. Performance & Query Optimizations (~300 lines)

Database and query improvements:

src/repository/provider.ts
src/actions/providers.ts
src/app/[locale]/dashboard/quotas/providers/page.tsx

5. Bug Fixes & Minor Improvements (~400 lines)

Independent fixes:

src/lib/session-tracker.ts
src/lib/rate-limit/service.ts
src/middleware.ts (login redirect fix)
src/components/ui/dialog.tsx (width fix)
README.md, README.en.md (docs)

Why Split?

✅ Easier to review - Reviewers can focus on one concern at a time
✅ Faster CI feedback - Smaller PRs run tests faster
✅ Easier to revert - If one feature has issues, it can be reverted independently
✅ Better git history - Clear commits showing feature evolution
✅ Reduced merge conflicts - Smaller surface area for conflicts

Recommended Merge Order:

Bug Fixes & Minor Improvements (foundation)
Performance & Query Optimizations (needed for new features)
Smart Circuit Breaker Probing (standalone feature)
Provider Availability Monitoring (depends on optimizations)
Enhanced Provider Testing (depends on availability service)

🤖 Automated analysis by Claude AI

ding113

📋 Code Review Summary

This PR introduces a comprehensive provider availability monitoring system, intelligent circuit breaker probing, and enhanced provider testing functionality. The implementation adds significant monitoring capabilities but has several issues that should be addressed.

🔍 Issues Found

Critical (🔴): 1 issue
High (🟠): 2 issues
Medium (🟡): 4 issues
Low (🟢): 0 issues

🎯 Priority Actions

🔴 Critical - API Key Exposure in URL (src/lib/provider-testing/utils/test-prompts.ts:291): The Gemini API key is appended as a URL query parameter (?key=${apiKey}). This is a security risk as URLs with query parameters are commonly logged in server access logs, proxy logs, and browser history. Consider using header-based authentication (x-goog-api-key header) instead.
🟠 High - Missing Error Handling in Probe Scheduler (src/lib/circuit-breaker-probe.ts:232-234): The runProbeCycle() is called without await in the interval callback, but errors are only caught at the .catch() level. If loadProviderConfigs() throws synchronously before the async operations, it could crash the scheduler.
🟠 High - Unbounded Memory Growth in Provider Config Cache (src/lib/circuit-breaker-probe.ts:31-33): The providerConfigCache Map grows indefinitely as providers are added but never cleaned up when providers are deleted. Consider adding cache invalidation when providers are removed or implementing a maximum cache size.
🟡 Medium - Missing Input Validation for Custom JSON Payload (src/lib/provider-testing/test-service.ts:67-71): The custom payload parsing catches JSON parse errors but doesn't validate the structure of the parsed object. Malformed payloads could cause unexpected behavior downstream.
🟡 Medium - Potential SQL Performance Issue with Window Queries (src/repository/provider.ts:408-416): While the 7-day limit on latest_call CTE is a good optimization, the window function ROW_NUMBER() still needs to process all rows within that window. Consider adding an index on (final_provider_id, created_at DESC) if not already present.
🟡 Medium - Race Condition in Batch Redis Pipeline (src/lib/rate-limit/service.ts:888-970): The batch cost query assumes all Redis operations will return in order. If Redis has issues with specific keys, the queryMeta mapping could become misaligned with results.
🟡 Medium - Missing Timeout for Provider Config Loading (src/lib/circuit-breaker-probe.ts:69-87): The database query in loadProviderConfigs() has no timeout. If the database is slow or unresponsive, it could block the probe scheduler indefinitely.

💡 General Observations

The N+1 query fix using Redis batch operations (getCurrentCostBatch and getProviderSessionCountBatch) is a good performance improvement.
The availability monitoring system is well-structured with proper separation of concerns.
The three-tier validation system (HTTP status → latency → content) follows good patterns from the relay-pulse reference implementation.
Consider adding rate limiting to the /api/availability endpoint to prevent abuse.

🤖 Automated review by Claude AI - focused on identifying issues for improvement

ding113 · 2025-11-27T03:49:24Z

🔒 Security Scan Results

✅ No critical security vulnerabilities detected

This PR has been scanned against OWASP Top 10, CWE Top 25, and common security anti-patterns. No security issues requiring immediate attention were identified in the code changes.

📋 OWASP Top 10 Coverage

Category	Status	Notes
A01: Injection (SQL, Command)	✅ Clean	Drizzle ORM with parameterized queries used throughout. No raw SQL concatenation.
A02: Broken Authentication	✅ Clean	Admin session validation via `getSession()` consistently applied to new API endpoints.
A03: Sensitive Data Exposure	✅ Clean	API keys handled appropriately, no logging of secrets detected.
A04: XML External Entities	✅ N/A	No XML parsing in changes.
A05: Broken Access Control	✅ Clean	All new endpoints (`/api/availability`, `/api/availability/current`) properly check admin role.
A06: Security Misconfiguration	✅ Clean	No debug modes enabled, proper error handling.
A07: Cross-Site Scripting	✅ Clean	React auto-escaping used, no `dangerouslySetInnerHTML`.
A08: Insecure Deserialization	✅ Clean	JSON parsing has proper error handling with try-catch.
A09: Known Vulnerabilities	⚪ N/A	No new dependencies added.
A10: Insufficient Logging	✅ Clean	Proper logging added for new features.

🛡️ Security Highlights

Positive Security Patterns Found:

SSRF Protection (src/actions/providers.ts:2355-2365)
- Provider URL validation via validateProviderUrlForConnectivity() before making outbound requests
- Proper URL scheme validation (http/https only)
SQL Injection Prevention (src/lib/availability/availability-service.ts, src/repository/provider.ts)
- Drizzle ORM with proper parameterized queries
- inArray(), eq(), gte(), lte() operators prevent SQL injection
- Timezone parameter properly escaped in template literal
Authentication Consistency
- All new API routes check getSession() and verify role === 'admin'
- Consistent authorization pattern across:
  - /api/availability/route.ts
  - /api/availability/current/route.ts
  - testProviderUnified() server action
  - getProviderLimitUsageBatch() server action
Input Validation
- Query parameters parsed safely with parseInt() and isNaN() checks
- Provider IDs validated before use in queries
- Custom payload JSON parsing wrapped in try-catch
Resource Limits (DoS Prevention)
- MAX_REQUESTS_PER_QUERY = 100000 limit in availability-service.ts
- Request timeout enforcement in provider testing
- Proper abort controller usage with timeout cleanup in finally block

⚠️ Minor Observations (Not Security Issues)

Information Disclosure (Low/Informational)
- Error messages returned to admin users contain some internal details
- This is acceptable since these endpoints are admin-only
Rate Limiting
- New availability API endpoints rely on session-based auth but don't have explicit rate limiting
- Mitigated by admin-only access requirement

📝 Files Scanned

src/actions/providers.ts - Provider testing, batch queries
src/app/api/availability/*.ts - New availability API endpoints
src/lib/availability/*.ts - Availability service implementation
src/lib/circuit-breaker-probe.ts - Smart probing scheduler
src/lib/circuit-breaker.ts - Circuit breaker state management
src/lib/provider-testing/*.ts - Provider testing service
src/lib/rate-limit/service.ts - Rate limit batch queries
src/lib/session-tracker.ts - Session tracking batch queries
src/repository/provider.ts - Provider statistics SQL
src/middleware.ts - Auth middleware

🏁 Conclusion

Security Posture: Acceptable ✅

This PR follows secure coding practices:

Uses parameterized queries via Drizzle ORM
Implements proper authentication/authorization checks
Validates user input before use
Has SSRF protection for outbound requests
Includes resource limits to prevent DoS

No blocking security concerns. Safe to merge from a security perspective.

🤖 Automated security scan by Claude AI - OWASP Top 10 & CWE coverage

ding113

📝 Documentation Review

✅ Documentation looks good

No significant issues found. The documentation changes are:

Technically accurate
Complete and clear
Well-structured
Free of broken links

📋 Review Coverage

Technical accuracy - 0 issues
Completeness - 0 issues
Code examples - 0 issues (no code examples in changes)
Links and references - 0 issues (all 3 GitHub links verified: zsio/claude-code-hub ✅, router-for-me/CLIProxyAPI ✅, prehisle/relay-pulse ✅)
Clarity and organization - 0 issues

✅ Changes Validated

README.md & README.en.md: New acknowledgment for prehisle/relay-pulse is accurate and consistent between English and Chinese versions.
.env.example: Smart probing configuration documentation correctly documents:
- ENABLE_SMART_PROBING (default: false) ✅
- PROBE_INTERVAL_MS (default: 30000ms = 30s) ✅
- PROBE_TIMEOUT_MS (default: 5000ms = 5s) ✅
All values match the implementation in src/lib/circuit-breaker-probe.ts.
i18n files (settings.json): The streaming idle timeout documentation fix from "1-120 seconds" to "60-600 seconds" is correct - matches the validation constants in src/lib/constants/provider.constants.ts:34 (STREAMING_IDLE_TIMEOUT_MS: { MIN: 60000, MAX: 600000 }).

🤖 Automated docs review by Claude AI

ding113

Inline comments for PR review

ding113 · 2025-11-27T03:50:37Z

src/lib/provider-testing/utils/test-prompts.ts

+  }
+
+  return url;
+}


🔴 Critical: Potential API Key Exposure in URL

Why this is a problem: The API key is appended as a URL query parameter (?key=${apiKey}). URLs with query parameters are commonly logged in:

Server access logs

Reverse proxy logs

Browser history

Network debugging tools

Error tracking services (like Sentry)

This could lead to API key exposure.

Suggested fix:

// Use x-goog-api-key header instead of URL parameter case "gemini": case "gemini-cli": return { ...baseHeaders, ...GEMINI_TEST_HEADERS, "x-goog-api-key": apiKey, };

Then remove the URL parameter logic and pass the API key via headers in getTestHeaders() instead.

ding113 · 2025-11-27T03:50:37Z

src/lib/circuit-breaker-probe.ts

+  name: string;
+  url: string;
+  key: string;
+  providerType: ProviderType;


🟠 High: Unbounded Memory Growth in Provider Config Cache

Why this is a problem: The providerConfigCache Map grows indefinitely as providers are added but is never cleaned up when providers are deleted. Over time, this could lead to memory issues in long-running processes.

Suggested fix:

// Option 1: Add maximum cache size with LRU eviction const MAX_CACHE_SIZE = 1000; async function loadProviderConfigs(): Promise<void> { // ... existing code ... // If cache exceeds max size, clear old entries if (providerConfigCache.size > MAX_CACHE_SIZE) { const keysToDelete = Array.from(providerConfigCache.keys()) .slice(0, providerConfigCache.size - MAX_CACHE_SIZE); keysToDelete.forEach(key => providerConfigCache.delete(key)); } } // Option 2: Invalidate cache for deleted providers // Build cache fresh each time from DB query (which already filters deleted_at IS NULL) providerConfigCache.clear(); providerConfigCache = new Map(providerList.map(...));

ding113 · 2025-11-27T03:50:38Z

src/lib/provider-testing/test-service.ts

+    }
+  } else if (config.preset) {
+    // Use preset configuration
+    body = getPresetPayload(config.preset, config.model);


🟡 Medium: Missing Input Validation for Custom JSON Payload

Why this is a problem: The code catches JSON parse errors but doesn't validate the structure of the parsed object. A malformed payload (e.g., missing required fields, wrong types) could cause unexpected behavior or cryptic errors downstream.

Suggested fix:

if (config.customPayload) { try { body = JSON.parse(config.customPayload); // Validate basic structure if (typeof body !== 'object' || body === null) { throw new Error('Custom payload must be a JSON object'); } } catch (e) { if (e instanceof SyntaxError) { throw new Error('Invalid custom payload JSON: ' + e.message); } throw e; } }

ding113

Additional inline comments

ding113 · 2025-11-27T03:50:58Z

src/lib/circuit-breaker-probe.ts

+      logger.error("[SmartProbe] Scheduled probe cycle failed", {
+        error: error instanceof Error ? error.message : String(error),
+      });
+    });


🟠 High: Missing Error Handling for Synchronous Errors

Why this is a problem: The runProbeCycle() function is called in the interval callback without proper error handling for synchronous throws. While .catch() handles async rejections, if loadProviderConfigs() or early parts of the function throw synchronously (e.g., due to import failures), it won't be caught.

Suggested fix:

probeIntervalId = setInterval(() => { void (async () => { try { await runProbeCycle(); } catch (error) { logger.error('[SmartProbe] Scheduled probe cycle failed', { error: error instanceof Error ? error.message : String(error), }); } })(); }, PROBE_INTERVAL_MS);

Alternatively, use a more explicit async wrapper pattern.

ding113 · 2025-11-27T03:50:58Z

src/lib/rate-limit/service.ts

+            window24h.toString()
+          );
+          queryMeta.push({ providerId, period: "daily", isRolling: true });
+        } else {


🟡 Medium: Potential Index Mismatch in Pipeline Results

Why this is a problem: The code assumes pipeline results will always be returned in the exact order they were submitted. While Redis pipelines generally preserve order, if any command fails in an unexpected way, the queryMeta[i] mapping could become misaligned.

Suggested fix:

// Add defensive check to verify array lengths match if (pipelineResults.length !== queryMeta.length) { logger.error('[RateLimit] Pipeline results count mismatch', { expected: queryMeta.length, received: pipelineResults.length, }); return result; }

ding113 · 2025-11-27T03:50:58Z

src/repository/provider.ts

+            ELSE (provider_chain->-1->>'id')::int
          END AS final_provider_id,
          created_at AS last_call_time,
          model AS last_call_model


🟡 Medium: Window Function Performance Consideration

Why this is a problem: While the 7-day limit is a good optimization, the ROW_NUMBER() OVER (PARTITION BY final_provider_id ORDER BY created_at DESC) still requires sorting all rows within each partition. For high-traffic systems, this could be slow.

Recommendation: Ensure there's a composite index on (final_provider_id, created_at DESC) in the database schema to optimize this query. If not present, consider adding:

CREATE INDEX idx_message_request_provider_time ON message_request (final_provider_id, created_at DESC) WHERE deleted_at IS NULL;

Silentely and others added 18 commits November 26, 2025 23:19

refactor: 使用可选链简化错误提取逻辑

a589e01

- 减少嵌套层级，提高代码可读性 - 使用可选链操作符简化属性访问 - 保持相同的功能逻辑 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Merge pull request #212 from Silentely/fix/cubence-error-parsing

bb2298b

fix: 增强错误解析以支持中转服务的嵌套错误结构

chore: format code (dev-f435387)

0dc1e81

chore: format code (dev-1881a94)

5f969de

chore: format code (dev-72cd0de)

cc9ee9f

chore: format code (dev-f24ed47)

27a740e

docs: 添加 relay-pulse 项目致谢

0c3fe23

供应商检测功能参考了 prehisle/relay-pulse 项目的实现。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

ding113 added the size/XL Extra Large PR (> 1000 lines) label Nov 27, 2025

ding113 added the enhancement New feature or request label Nov 27, 2025

gemini-code-assist bot reviewed Nov 27, 2025

View reviewed changes

ding113 commented Nov 27, 2025

View reviewed changes

ding113 and others added 2 commits November 27, 2025 11:44

chore: format code (dev-7952f07)

308375b

ding113 added the bug Something isn't working label Nov 27, 2025

This was linked to issues Nov 27, 2025

bug: 供应商配置流式静默超时范围错误 #214

Closed

Feat：供应商状态页面 #211

Closed

ding113 commented Nov 27, 2025

View reviewed changes

ding113 merged commit 06311b1 into main Nov 27, 2025
10 checks passed

ding113 pushed a commit that referenced this pull request Nov 27, 2025

docs: update changelog for PR #216

fe70757

sususu98 pushed a commit to sususu98/claude-code-hub that referenced this pull request Nov 28, 2025

docs: update changelog for PR ding113#216

e28bea4

This was referenced Nov 28, 2025

Feat/chart overlay stacked #224

Merged

feat(error-rules): 添加错误响应覆盖功能 #225

Merged

ding113 pushed a commit that referenced this pull request Nov 28, 2025

docs: update changelog for PR #216

3a88e63

ding113 pushed a commit that referenced this pull request Nov 28, 2025

docs: update changelog for PR #216

7e832ad

ding113 mentioned this pull request Nov 28, 2025

release #234

Merged

3 tasks

Uh oh!

Conversation

ding113 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

摘要

问题

解决方案

🆕 新功能

🔧 改进

🐛 漏洞修复

📚 文档

变更

测试

截图（如适用）

Uh oh!

gemini-code-assist bot commented Nov 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

ding113 commented Nov 27, 2025

📊 PR Size Analysis

🔀 Suggested Split:

Why Split?

Uh oh!

ding113 commented Nov 27, 2025

🔒 Security Scan Results

Scanned Categories

Security-Positive Patterns Observed

Files Reviewed

Recommendations (Non-blocking)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

ding113 left a comment

Choose a reason for hiding this comment

📋 Code Review Summary

🔍 Issues Found

🎯 Priority Actions

💡 General Observations

Uh oh!

ding113 left a comment

Choose a reason for hiding this comment

📋 Code Review Summary

🔍 Issues Found

🎯 Priority Actions

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ding113 commented Nov 27, 2025

📝 Documentation Review

📊 Issues Summary

⚡ Priority Fixes

📋 Review Coverage

🔍 Detailed Findings

🟡 Terminology: Description may not accurately reflect functionality

✅ Verified Items

Uh oh!

ding113 commented Nov 27, 2025

📊 PR Size Analysis

🔀 Suggested Split:

1. Provider Availability Monitoring (~2,500 lines)

2. Smart Circuit Breaker Probing (~800 lines)

3. Enhanced Provider Testing (~2,500 lines)

4. Performance & Query Optimizations (~300 lines)

ding113 commented Nov 27, 2025 •

edited

Loading