Skip to content

refactor(provider): improve provider page performance#782

Merged
ding113 merged 56 commits intoding113:refactor/provider-performancefrom
tesgth032:issue/779-provider-performance
Feb 15, 2026
Merged

refactor(provider): improve provider page performance#782
ding113 merged 56 commits intoding113:refactor/provider-performancefrom
tesgth032:issue/779-provider-performance

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Feb 13, 2026

关联:ding113/claude-code-hub#779ding113/claude-code-hub#781

背景/问题

  • Providers 列表/详情存在明显的 N+1 与重复请求(逐 vendor / 逐 endpoint 拉取),在数据规模增大后会持续拖慢前端并产生“请求风暴”体感
  • 端点池相关出现“端点不存在/无法测试”等不稳定体感(多实例缓存不同步、批量能力不足导致峰值并发过高等)

主要改动(尽量保持 UX 不变)

  • Providers 列表端点计数:使用 batchGetVendorTypeEndpointStats 批量统计 enabled 数;tooltip 打开时才拉 endpoints 详情与熔断状态
  • 端点表 sparkline(probe logs):仅在可视区域(in-view)触发(useInViewOnce 共享 IntersectionObserver pool,减少大列表 observer 实例开销);10ms 微批量合并;优先走 batch HTTP batchGetProviderEndpointProbeLogs(允许部分 chunk 成功 + 部分 404/失败并存,按需降级);支持 query cancel/卸载时从批处理中移除请求,减少无意义 flush
  • 熔断状态:Redis pipeline 批量读取 + getAllEndpointHealthStatusAsync 同步内存;运行时端点选择也改为批量读取熔断状态,减少 Redis 往返
    • in-flight 去重:并发请求复用同一批 Redis 批量加载,避免“同一批 endpointIds 同时被多处触发”时重复 pipeline
    • 内存上限:health cache 增加 LRU 淘汰(可选 env:ENDPOINT_CIRCUIT_HEALTH_CACHE_MAX_SIZE),避免 endpointId 规模增长导致常驻内存膨胀
    • 写放大收敛:熔断恢复到默认 closed 状态时不再写回 Redis(改为删除 key),降低 Redis 内存与写入开销;保存状态用 pipeline 合并 HSET+EXPIRE 减少 RTT
  • 端点熔断 UI:batchGetEndpointCircuitInfo 不再强制 forceRefresh,利用 1s sync TTL + in-flight 去重降低瞬时 Redis 读峰值(保持展示语义不变)
  • 端点变更稳定性:编辑/删除/手动测活成功后 best-effort resetEndpointCircuit;probe 记录在端点被并发删除时静默忽略(避免 FK 失败导致任务中断)
  • 端点同步竞态补强:syncProviderEndpointOnProviderEdit 优先读取 active 行;revive 遇到 23505 时降级为重读 active 行,避免并发/历史脏数据导致事务回滚
  • 去掉 refresh 放大器:Providers 管理页主要操作移除 router.refresh();invalidation 从全局收敛到 vendor 维度;Vendor 视图端点池 section 增加 in-view 延迟挂载
  • 同类性能退化补强:provider 统计 SQL 去掉潜在笛卡尔积;my-usage/usage-logs 查询减少 join 与往返,并对 my-usage logs 的 total 计数做短 TTL 缓存以减少翻页/轮询重复 COUNT
  • 补充收敛(review follow-up):ProviderEndpointHover 降级并发 worker 边界更明确;my-usage model breakdown 透出 cacheCreation5m/1h 字段;distinct models/endpoints 缓存 cache hit 做 LRU-like bump + sliding TTL,减少高频筛选的重复 DISTINCT 扫描
  • Probe scheduler:探测候选按 vendor/type 维度 gating(仅在存在启用 provider 的 vendor/type 下探测端点池),避免全禁用/孤儿 vendor/type 持续 ping/写入 probe logs;同时仍覆盖端点池“手动添加”的端点(不要求与 providers.url 一一对应)(Provider Availability Endpoint #781
  • Endpoint Health(Dashboard Availability):端点列表按 vendor/type 维度 gating(仅在存在启用 provider 的 vendor/type 下展示端点池),避免孤儿 vendor/type 仍在 Dashboard 中展示,同时尽量保持端点池展示语义不变(Provider Availability Endpoint #781
  • 安全收敛:src/repository/provider-endpoints-batch.ts 移除顶层 "use server",改为 server-only(避免误暴露为 Server Action)

兼容/升级

  • 本 PR 包含索引迁移(无表结构变更,仅新增 index):
    • idx_provider_endpoints_pick_enabled:加速运行时端点选择热路径(vendor_id + provider_type + is_enabled 定位 + sort_order 有序扫描)
    • idx_providers_vendor_type_url_active:加速 provider URL 编辑时的引用判断(vendor/type/url 精确匹配)
    • idx_providers_enabled_vendor_type:加速 Dashboard/Probe scheduler 的 enabled vendor/type DISTINCT(partial:deleted_at IS NULL AND is_enabled = true AND provider_vendor_id > 0
    • idx_message_request_key_created_at_id:加速 my-usage/usage logs 的 key 维度分页与时间范围扫描(key + created_at desc + id desc,partial deleted_at IS NULL
    • idx_message_request_key_model_active:加速 my-usage 下拉筛选器 DISTINCT model(按 key 维度,partial 排除 warmup)
    • idx_message_request_key_endpoint_active:加速 my-usage 下拉筛选器 DISTINCT endpoint(按 key 维度,partial 排除 warmup)
    • idx_message_request_created_at_id_active:加速 admin usage logs 的 keyset 分页(created_at desc + id desc,partial deleted_at IS NULL
    • idx_message_request_model_active:加速 admin usage logs 筛选器的 DISTINCT model(partial deleted_at IS NULL AND model IS NOT NULL
    • idx_message_request_status_code_active:加速 admin usage logs 筛选器的 DISTINCT status_code(partial deleted_at IS NULL AND status_code IS NOT NULL
  • 注意:若生产环境 message_request 表数据量较大/写入频繁,索引迁移(尤其 0069/0070/0072)使用标准 CREATE INDEX,可能在建索引期间阻塞写入;如对停写敏感,建议维护窗口升级,或提前手动使用 CREATE INDEX CONCURRENTLY 预建同名索引(迁移已使用 IF NOT EXISTS,预建不会冲突)
  • Docker 用户:拉取最新镜像并重启即可;默认 AUTO_MIGRATE=true 会自动应用迁移
  • 若生产环境关闭了 AUTO_MIGRATE:请在升级后执行 bun run db:migrate(或使用 Drizzle CLI)应用迁移
  • 前端对 batch HTTP endpoint 做了 404 兼容(避免滚动发布时短暂不可用)

本地验证

  • bun run lint(仅既有 useExhaustiveDependencies warnings)
  • bun run lint:fix
  • bun run typecheck
  • bun run test
  • bun run build
  • bun run db:generate(生成索引迁移;迁移幂等性检查通过)

Greptile Summary

This PR implements comprehensive performance and stability optimizations for the Provider management system, addressing N+1 query patterns, request storms, and cache synchronization issues across a large-scale distributed deployment.

Major Improvements:

  • Batch Processing Infrastructure: New provider-endpoints-batch.ts consolidates vendor stats and probe logs fetching into single parameterized queries with LATERAL joins, reducing round-trips from O(n) to O(1) for lists
  • Frontend Request Coalescing: 10ms micro-batching with useInViewOnce viewport detection prevents request storms on scroll/mount; shared IntersectionObserver pool reduces observer overhead in large lists
  • Circuit Breaker Enhancements: In-flight request dedup, batch Redis pipeline loading, LRU cache with configurable limits (default 10k endpoints), and default-state deletion to reduce write amplification
  • Probe Scheduler Gating: CTE-based vendor/type filtering ensures only endpoints with active enabled providers are probed, eliminating wasted cycles on orphaned endpoints
  • Usage Logs Optimization: Slim queries replace full joins, total count caching (30s TTL) prevents repeated COUNT scans on pagination, new indexes accelerate keyset pagination and DISTINCT filtering
  • Database Indexes: 10 new indexes on message_request, providers, and provider_endpoints to support runtime endpoint selection, usage log pagination, and DISTINCT queries (⚠️ migrations use standard CREATE INDEX - may block writes on large tables)

Compatibility Notes:

  • Migrations 0069/0070/0072 add 6 indexes to high-write message_request table without CONCURRENTLY - production deployments should either apply during maintenance window or pre-create indexes manually
  • Frontend includes 404 fallback for batch endpoints to handle rolling deployments gracefully
  • All changes maintain existing UX semantics while improving backend efficiency

Test Coverage:

  • New tests for useInViewOnce shared observer pooling
  • Integration test for concurrent provider URL edit race conditions
  • Existing probe scheduler, circuit breaker, and usage aggregation tests updated

Confidence Score: 4/5

  • This PR is safe to merge with attention to migration timing - production deployments should schedule index creation during maintenance windows
  • Score reflects excellent engineering (batching, caching, gating), comprehensive test coverage, and thoughtful migration warnings. Deducted 1 point due to standard CREATE INDEX on high-write table requiring careful production rollout planning
  • Migrations 0069/0070/0072 require careful deployment planning for large message_request tables - consider CREATE INDEX CONCURRENTLY or maintenance window

Important Files Changed

Filename Overview
drizzle/0069_broad_hellfire_club.sql Three indexes on message_request for usage logs pagination and filtering - locking noted in comments
drizzle/0070_warm_lilandra.sql Index for keyset pagination on message_request and GIN index for users tags - locking noted
drizzle/0072_absurd_gwen_stacy.sql Partial indexes on message_request for DISTINCT model/endpoint filtering in my-usage views
src/repository/provider-endpoints-batch.ts New batch queries for vendor endpoint stats and probe logs using parameterized SQL with LATERAL joins
src/lib/endpoint-circuit-breaker.ts Enhanced with LRU cache, batch Redis loading, in-flight dedup, and default state deletion to reduce write amplification
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx 10ms micro-batching with in-view triggering, batch HTTP endpoint with 404 fallback, query cancellation support
src/lib/hooks/use-in-view-once.ts Shared IntersectionObserver pool with automatic cleanup, prevents request storms on large lists
src/lib/provider-endpoints/probe-scheduler.ts Vendor/type gating with INNER JOIN to only probe endpoints where enabled providers exist
src/repository/provider-endpoints.ts Probe query gated by enabled_vendor_types CTE, sync race handling improved with active row fallback
src/repository/provider.ts Provider statistics with in-flight dedup and cache, DST-aware date boundaries, reduced invalidation scope
src/repository/usage-logs.ts Slim query replaces full joins, total count cached with 30s TTL to reduce pagination overhead
src/actions/my-usage.ts Model breakdown adds cache creation tokens, slim queries reduce joins, 30s total count caching

Flowchart

flowchart TD
    A[Provider List/Detail View] -->|Mount| B{useInViewOnce}
    B -->|In View| C[10ms Micro-Batch Queue]
    C -->|Flush| D{Batch Size Check}
    D -->|>1 Endpoint| E[POST /api/batchGetProviderEndpointProbeLogs]
    D -->|Single| F[Fallback: Individual Server Action]
    E -->|404| G[Mark Batch Unavailable<br/>5min Retry Timer]
    E -->|Success| H[Batch Available]
    E -->|Partial Fail| I[Fallback: Individual Fetch]
    H --> J[Merge Results]
    I --> J
    F --> J
    G -->|After 5min| D
    
    K[Endpoint Selector Runtime] -->|Pick Endpoint| L{Redis Sync Needed?}
    L -->|Yes - In-flight?| M{Wait Existing}
    L -->|Yes - New| N[Batch Pipeline Load<br/>Redis HGETALL]
    M --> O[Sync to Memory]
    N --> O
    L -->|No - Use Cache| O
    O --> P{Circuit State}
    P -->|Open| Q[Skip Endpoint]
    P -->|Closed/Half-Open| R[Use Endpoint]
    
    S[Probe Scheduler Tick] -->|Query DB| T[INNER JOIN<br/>enabled_vendor_types CTE]
    T -->|Filter by Interval| U[Due Endpoints]
    U -->|Concurrent Probe| V[Record Success/Failure]
    V -->|Update| W[Circuit Breaker State]
    W -->|Default Closed| X[DELETE Redis Key]
    W -->|Non-Default| Y[HSET + EXPIRE Pipeline]
    
    Z[Usage Logs Query] -->|First Page| AA{Total Count Cached?}
    AA -->|No| AB[COUNT + Cache 30s]
    AA -->|Yes| AC[Use Cached Total]
    AB --> AD[Slim Query<br/>keyset pagination]
    AC --> AD
    AD -->|Indexed Scan| AE[Results]
Loading

Last reviewed commit: 6e3c10c

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

本次变更引入端点探针与按厂商/类型的批量统计/日志接口,支持按 ID 批量读取并同步电路健康状态(含 Redis 批量加载与状态转移);增加视口首次进入懒加载钩子;服务器端新增 slim usage-log 与 quota 聚合;前端改为精确的 react-query 缓存失效与按需加载。

Changes

Cohort / File(s) Summary
配置
biome.json
更新 JSON schema 版本 2.3.14 → 2.3.15。
后端动作与路由
src/actions/my-usage.ts, src/actions/provider-endpoints.ts, src/app/api/actions/[...route]/route.ts
新增 admin 批量 actions (batchGetProviderEndpointProbeLogs, batchGetVendorTypeEndpointStats);my-usage 切换到 slim usage-log 查询并使用 quota 聚合接口;加强外键/缓存失效容错。
批量仓库与统计
src/repository/provider-endpoints-batch.ts, src/repository/usage-logs.ts, src/repository/statistics.ts, src/repository/provider.ts, src/repository/provider-endpoints.ts, src/repository/index.ts
新增批量仓库函数(probe logs / vendor-type stats)、findUsageLogsForKeySlim、distinct 缓存、quota-cost 聚合(sumUserQuotaCosts/sumKeyQuotaCostsById),以及导出/新增 findEnabledProviderEndpointsByVendorAndType 与 probe-log 写入防护。
电路断路器与 Redis 批量加载
src/lib/endpoint-circuit-breaker.ts, src/lib/redis/endpoint-circuit-breaker-state.ts, src/lib/provider-endpoints/endpoint-selector.ts
新增 getAllEndpointHealthStatusAsyncloadEndpointCircuitStates,实现 Redis 批量加载并合并内存状态;端点选择改为基于批量健康状态判断并使用已启用端点查询。
前端组件与按视口懒加载
src/lib/hooks/use-in-view-once.ts, src/app/[locale]/settings/providers/_components/...
新增 useInViewOnce 钩子;多处组件(EndpointLatencySparklineProviderEndpointHoverProviderEndpointsTableProviderRichListItem 等)改用批量 probe/logs 与批量电路状态,并支持 deferUntilInView 延迟加载,移除 router.refresh,改用精确 invalidateQueries。
路由与 API 暴露
src/app/api/actions/[...route]/route.ts
注册并暴露两个 admin-only 批量路由及 OpenAPI 描述:batchGetProviderEndpointProbeLogsbatchGetVendorTypeEndpointStats
Redis / Drizzle 模式 与 索引
drizzle/0067_fuzzy_quasar.sql, src/drizzle/schema.ts, drizzle/meta/_journal.json
新增针对 provider/providers 与 provider_endpoints 的部分索引以优化非删除行查询;添加迁移条目。
端点选择器与探针写入行为
src/lib/provider-endpoints/endpoint-selector.ts, src/repository/provider-endpoints.ts
端点可用性判断改为批量健康查询;新增查找已启用端点函数;recordProviderEndpointProbeResult 在更新确认存在后才写 probe log,避免 FK 问题。
测试
tests/unit/...
更新与新增大量单元测试与 mocks(替换 findUsageLogsWithDetailsfindUsageLogsForKeySlim、新增 batch mocks、getAllEndpointHealthStatusAsync mock、quota 聚合 mock),并新增 probe-result 单测以覆盖新行为。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (46 files):

⚔️ biome.json (content)
⚔️ drizzle/meta/_journal.json (content)
⚔️ scripts/clear-session-bindings.ts (content)
⚔️ src/actions/my-usage.ts (content)
⚔️ src/actions/provider-endpoints.ts (content)
⚔️ src/actions/users.ts (content)
⚔️ src/app/[locale]/settings/providers/_components/add-provider-dialog.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-list.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-rich-list-item.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (content)
⚔️ src/app/v1/_lib/proxy/rate-limit-guard.ts (content)
⚔️ src/app/v1/_lib/proxy/response-handler.ts (content)
⚔️ src/app/v1/_lib/proxy/session-guard.ts (content)
⚔️ src/drizzle/schema.ts (content)
⚔️ src/lib/endpoint-circuit-breaker.ts (content)
⚔️ src/lib/provider-endpoints/endpoint-selector.ts (content)
⚔️ src/lib/proxy-agent/agent-pool.ts (content)
⚔️ src/lib/rate-limit/concurrent-session-limit.ts (content)
⚔️ src/lib/rate-limit/service.ts (content)
⚔️ src/lib/redis/endpoint-circuit-breaker-state.ts (content)
⚔️ src/lib/redis/lua-scripts.ts (content)
⚔️ src/lib/session-manager.ts (content)
⚔️ src/lib/session-tracker.ts (content)
⚔️ src/repository/index.ts (content)
⚔️ src/repository/provider-endpoints.ts (content)
⚔️ src/repository/provider.ts (content)
⚔️ src/repository/statistics.ts (content)
⚔️ src/repository/usage-logs.ts (content)
⚔️ tests/unit/actions/my-usage-concurrent-inherit.test.ts (content)
⚔️ tests/unit/actions/my-usage-date-range-dst.test.ts (content)
⚔️ tests/unit/actions/provider-endpoints.test.ts (content)
⚔️ tests/unit/actions/total-usage-semantics.test.ts (content)
⚔️ tests/unit/lib/endpoint-circuit-breaker.test.ts (content)
⚔️ tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (content)
⚔️ tests/unit/lib/proxy-agent/agent-pool.test.ts (content)
⚔️ tests/unit/lib/rate-limit/concurrent-session-limit.test.ts (content)
⚔️ tests/unit/lib/rate-limit/service-extra.test.ts (content)
⚔️ tests/unit/lib/session-tracker-cleanup.test.ts (content)
⚔️ tests/unit/lib/session-ttl-validation.test.ts (content)
⚔️ tests/unit/proxy/rate-limit-guard.test.ts (content)
⚔️ tests/unit/settings/providers/provider-rich-list-item-endpoints.test.tsx (content)

These conflicts must be resolved before merging into dev.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed PR 描述详细且相关:说明了 N+1 查询、请求风暴等性能问题背景;列举主要改动涉及批量接口、懒加载、熔断优化、缓存策略、数据库索引迁移等,与代码变更相符。
Title check ✅ Passed 标题清晰具体,准确反映了该 PR 的核心变更:优化 provider 页面的性能。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch issue/779-provider-performance
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在全面优化 Providers 功能的性能和稳定性。通过解决 N+1 查询问题、减少重复请求、引入批量数据加载机制以及细化前端渲染策略,显著提升了用户界面的响应速度和后端服务的负载能力。同时,对数据库查询进行了深度优化,确保了系统在数据规模增长时的可扩展性和健壮性,从而为用户提供更流畅、更可靠的 Providers 管理体验。

Highlights

  • Providers 列表性能优化: Providers 列表的端点计数现在使用 batchGetVendorTypeEndpointStats 进行批量统计,并且端点详情和熔断状态仅在 tooltip 打开时按需加载,显著减少了初始请求量。
  • 端点探针日志 Sparkline 优化: 端点表的 Sparkline(探针日志)现在仅在可视区域内触发加载,并采用 10ms 微批量合并请求。优先使用批量 HTTP API batchGetProviderEndpointProbeLogs,若遇 404 或异常则自动降级为单条 action(并发限制 ≤4),避免了请求风暴。
  • 熔断状态批量读取与同步: 熔断状态现在通过 Redis pipeline 批量读取,并使用 getAllEndpointHealthStatusAsync({ forceRefresh: true }) 同步内存状态。前端对批量请求进行 500 条分片,以避免超限。
  • 移除全局刷新与细化缓存失效: Providers 管理页面的主要操作移除了 router.refresh() 调用,缓存失效机制从全局收敛到 vendor 维度。Vendor 视图的端点池部分增加了 in-view 延迟挂载,进一步优化了页面加载性能。
  • 数据库查询性能提升: Provider 统计的 SQL 查询已优化,去除了潜在的笛卡尔积。my-usageusage-logs 的查询也减少了 JOIN 操作和往返次数,提升了数据检索效率。
Changelog
  • biome.json
    • 更新了 Biome 模式版本。
  • src/actions/my-usage.ts
    • 移除了 keysTable 导入。
    • findUsageLogsWithDetails 替换为更优化的 findUsageLogsForKeySlim
    • 删除了已废弃的 sumUserCost 函数。
    • 更新了 getMyQuota 以直接使用 sumUserTotalCostsumUserCostInTimeRange
    • 更新了 getMyUsageLogs 以使用 findUsageLogsForKeySlim
    • 优化了 getUserConcurrentSessions,直接使用 SessionTracker.getUserSessionCount 避免 N+1 查询。
  • src/actions/provider-endpoints.ts
    • 新增了 getAllEndpointHealthStatusAsync 导入。
    • 新增了批量仓库函数 findProviderEndpointProbeLogsBatchfindVendorTypeEndpointStatsBatch 的导入。
    • 定义了新的 Zod 模式用于批量请求 (BatchGetVendorTypeEndpointStatsSchema, BatchGetProviderEndpointProbeLogsBatchSchema)。
    • 新增了 isForeignKeyViolationError 辅助函数。
    • addProviderEndpoint, editProviderEndpoint, removeProviderEndpoint, editProviderVendor 后添加了 publishProviderCacheInvalidation 调用。
    • 实现了 batchGetProviderEndpointProbeLogsbatchGetVendorTypeEndpointStats 批量操作。
    • 更新了 batchGetEndpointCircuitInfo 以使用 getAllEndpointHealthStatusAsync 批量获取熔断状态。
  • src/app/[locale]/settings/providers/_components/add-provider-dialog.tsx
    • 移除了 useRouter 导入和 router.refresh() 调用。
    • 新增了 providers-statisticsprovider-vendors 查询的缓存失效逻辑。
  • src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
    • 新增了 useInViewOnce Hook。
    • 引入了 ProbeLog 类型。
    • 实现了探针日志的批量获取逻辑(normalizeProbeLogsByEndpointId, tryFetchBatchProbeLogsByEndpointIds, fetchProbeLogsByEndpointIds, ProbeLogsBatcher)。
    • 当元素进入可视区域时才启用查询。
    • 更新了渲染以使用 useInViewOnceref
  • src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx
    • 导入了 batchGetEndpointCircuitInfobatchGetVendorTypeEndpointStats
    • 实现了供应商类型端点统计的批量获取机制(requestVendorTypeEndpointStatsBatched, flushVendorTypeEndpointStats)。
    • 更新了 ProviderEndpointHover 以使用批量统计和熔断信息。
    • 移除了 EndpointRow 中对单个端点熔断信息的独立获取。
    • 新增了端点加载状态的显示。
  • src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx
    • 新增了 useInViewOnce Hook。
    • 更新了 batchGetEndpointCircuitInfo 以在内部处理批量分块(每批 500 个)。
    • 将查询缓存失效键从全局 provider-endpoints 更改为特定供应商的 provider-endpoints/[vendorId]
    • ProviderEndpointsSection 添加了 deferUntilInView 属性,以根据可见性条件渲染 ProviderEndpointsTable
  • src/app/[locale]/settings/providers/_components/provider-list.tsx
    • 新增了 useQuery 以获取 getProviderVendors 数据。
    • vendor 属性传递给 ProviderRichListItem
  • src/app/[locale]/settings/providers/_components/provider-rich-list-item.tsx
    • 移除了 useRoutergetProviderVendors 导入。
    • 接受了 vendor 属性。
    • 移除了各种操作(删除、重置熔断、重置用量、切换、编辑、组编辑、克隆)后的 router.refresh() 调用。
  • src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx
    • deferUntilInView 属性应用于 ProviderEndpointsSection
  • src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx
    • 移除了 useRouter 导入和 router.refresh() 调用。
    • 新增了 providers-healthproviders-statistics 查询的缓存失效逻辑。
  • src/lib/endpoint-circuit-breaker.ts
    • 新增了 loadEndpointCircuitStates 导入。
    • 实现了 getAllEndpointHealthStatusAsync,用于从 Redis 批量获取和刷新熔断状态。
  • src/lib/hooks/use-in-view-once.ts
    • 新增了 useInViewOnce 文件,提供了用于延迟加载的 Hook。
  • src/lib/redis/endpoint-circuit-breaker-state.ts
    • 实现了 loadEndpointCircuitStates,通过 Redis pipeline 批量获取多个熔断器状态。
  • src/repository/provider-endpoints-batch.ts
    • 新增了 provider-endpoints-batch.ts 文件,包含了 findVendorTypeEndpointStatsBatchfindProviderEndpointProbeLogsBatch 批量数据库查询函数。
  • src/repository/provider.ts
    • getProviderStatistics 添加了内存缓存。
    • 优化了 getProviderStatistics 的 SQL 查询,避免了笛卡尔积并使用了时区感知的过滤。
  • src/repository/statistics.ts
    • 实现了 getKeyStringByIdCached,带有短生命周期的内存缓存,以减少 key 字符串的数据库查找。
    • 更新了 sumKeyTotalCostById, sumKeyCostInTimeRange, findKeyCostEntriesInTimeRange, getRateLimitEventStats 以使用 getKeyStringByIdCached
  • src/repository/usage-logs.ts
    • 引入了 findUsageLogsForKeySlim,用于更优化的用量日志查询。
    • getDistinctModelsForKeygetDistinctEndpointsForKey 添加了内存缓存。
  • tests/unit/actions/my-usage-date-range-dst.test.ts
    • 更新了测试以使用 findUsageLogsForKeySlim 的模拟。
  • tests/unit/actions/provider-endpoints.test.ts
    • 更新了测试以使用 getAllEndpointHealthStatusAsync 的模拟进行批量熔断信息获取。
  • tests/unit/actions/total-usage-semantics.test.ts
    • 更新了源代码验证的正则表达式。
  • tests/unit/settings/providers/provider-rich-list-item-endpoints.test.tsx
    • 在测试中为 ProviderRichListItem 添加了 vendor 属性。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added enhancement New feature or request area:UI area:provider size/L Large PR (< 1000 lines) labels Feb 13, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on optimizing data fetching and rendering performance, particularly for provider endpoint information and usage logs. Key changes include refactoring src/actions/my-usage.ts to use a new findUsageLogsForKeySlim function and a more efficient getUserConcurrentSessions method, and removing a deprecated sumUserCost function. A new src/repository/provider-endpoints-batch.ts file was introduced to provide batch fetching capabilities for provider endpoint probe logs and vendor-type endpoint statistics, which are then utilized by new Server Actions in src/actions/provider-endpoints.ts. The batchGetEndpointCircuitInfo action was updated to use a new getAllEndpointHealthStatusAsync function for batch retrieval of circuit breaker states from Redis, improving efficiency. Client-side components like EndpointLatencySparkline, ProviderEndpointHover, and ProviderEndpointsTable were updated to leverage these new batching actions and introduce a useInViewOnce hook for deferred loading of data when elements enter the viewport, preventing request storms. Additionally, router.refresh() calls were replaced with more granular queryClient.invalidateQueries() calls across several UI components for better cache management. The getProviderStatistics function in src/repository/provider.ts was optimized with a new SQL query structure and an in-memory cache, and src/repository/statistics.ts introduced a cache for keyId to keyString lookups to reduce database queries. Review comments highlighted critical security vulnerabilities due to the insecure exposure of repository functions as public Server Actions in src/repository/provider-endpoints-batch.ts, src/actions/my-usage.ts (findUsageLogsForKeySlim), and src/repository/provider.ts (getProviderStatistics), which lack proper authentication and authorization. Other feedback included concerns about hardcoded API paths, inefficient useEffect dependencies, and potential cache-busting issues with the keyStringByIdCache implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/repository/provider.ts`:
- Around line 900-901: The comment containing an emoji should be cleaned: remove
the "⭐" character from the comment that explains using the last item of
providerChain (provider_chain) to determine the final provider and falling back
to provider_id when provider_chain is empty; keep the rest of the Chinese text
intact and ensure references to providerChain, provider_chain, and provider_id
remain unchanged.
- Around line 976-983: The code currently casts the db.execute() result directly
to ProviderStatisticsRow[] (using "const data = result as unknown as
ProviderStatisticsRow[]"), but db.execute() from postgres-js/drizzle-orm returns
an iterable and other call sites use Array.from(result); replace the direct type
assertion by converting the result with Array.from(result) and then type it as
ProviderStatisticsRow[] before storing it in providerStatisticsCache and
returning it so the handling matches the other db.execute() usages.
🧹 Nitpick comments (18)
src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)

468-474: 编辑成功后失效 health 和 statistics 缓存,逻辑正确。

编辑可能变更 key、type 等核心字段,同时刷新 health 和 statistics 是合理的。

一个小的一致性提示:onSuccess(创建流程,Line 121-125)目前只失效了 "providers""provider-vendors",未包含 "providers-health" / "providers-statistics"。虽然新建的 provider 尚无健康检测/统计数据,影响有限,但如果后续需要在创建后立即显示健康状态,可能需要补齐。

src/lib/hooks/use-in-view-once.ts (1)

36-36: options 作为 effect 依赖会导致观察者在可见前反复重建。

如果调用方每次渲染传入新对象字面量(如 useInViewOnce({ rootMargin: "100px" })),由于引用不等,effect 会反复执行,在元素可见前不断创建/销毁 IntersectionObserver。

建议对 options 做序列化或让调用方自行 useMemo,或者在 hook 内部做浅比较缓存:

可选的稳定化方案
+import { useEffect, useRef, useState, useMemo } from "react";
+
 export function useInViewOnce<T extends Element>(options?: IntersectionObserverInit) {
   const ref = useRef<T | null>(null);
   const [isInView, setIsInView] = useState(false);
+  const stableOptions = useRef(options);
+  // shallow compare to avoid re-triggering
+  if (
+    options?.root !== stableOptions.current?.root ||
+    options?.rootMargin !== stableOptions.current?.rootMargin ||
+    options?.threshold !== stableOptions.current?.threshold
+  ) {
+    stableOptions.current = options;
+  }

   useEffect(() => {
     // ...
-  }, [isInView, options]);
+  }, [isInView, stableOptions.current]);
src/repository/usage-logs.ts (1)

284-301: UsageLogSlimFiltersUsageLogSlimRow 未导出,可能影响外部类型引用。

目前这两个接口只用 interface 声明,没有 export。如果有外部调用方需要引用过滤器或返回行的类型(例如在 mock 或测试中显式标注类型),将无法直接导入。

当前 my-usage.ts 传入内联对象、测试通过 mock 绕过,所以暂时无问题。后续如果需要在其他地方引用这些类型,建议加上 export

src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (2)

39-41: 模块级 deferred Map 在组件卸载后可能持有过期引用。

如果一个组件在 flushVendorTypeEndpointStats 执行前卸载,其 deferred promise 的 resolve/reject 仍留在 deferredByProviderTypeVendorId 中。虽然 resolve 一个已卸载组件的 promise 不会导致崩溃(React Query 会忽略已取消的查询),但 deferred 条目不会被清理,可能在极端快速挂载/卸载场景下累积。

当前的 setTimeout(fn, 0) 窗口极短,实际风险很低。仅作为防御性提醒。


260-267: 批量获取熔断状态时 res.data 可能为 undefined

res.oktrue 时,res.data 在类型层面可能仍是 T | undefined(取决于 ActionResult 的类型定义)。如果实际不可能出现 undefined,建议加 ?? [] 做防御:

-          const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk });
-          return res.ok ? res.data : [];
+          const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk });
+          return res.ok ? (res.data ?? []) : [];
src/actions/provider-endpoints.ts (1)

795-810: forceRefresh: true 每次触发 Redis 读取,注意高频场景。

batchGetEndpointCircuitInfo 使用 forceRefresh: true 绕过内存缓存直接读 Redis,适合管理页需要实时熔断状态的场景。但如果多个 tooltip 同时打开或短时间内反复打开/关闭,可能产生较多 Redis 请求。

当前 staleTime: 1000 * 10(10 秒)在 React Query 层提供了一定缓冲。如果后续发现 Redis 压力,可考虑将 forceRefresh 改为仅在首次加载或用户手动刷新时启用。

src/repository/provider-endpoints-batch.ts (2)

83-111: 原始 SQL 中硬编码了表名 provider_endpoint_probe_logs

第 106 行直接使用字符串表名而非 Drizzle schema 引用。如果表名在 schema 定义中变更,此处不会同步更新。建议通过 Drizzle 的 schema 对象获取表名,或至少添加注释说明此处与 schema 的对应关系。

根据编码规范,repository 层应使用 Drizzle ORM 进行数据访问。此处使用原始 SQL 实现 ROW_NUMBER() 窗口函数是合理的,但可以考虑引用 schema 中的表名以提高可维护性。

建议引用 schema 表名

在文件顶部导入 probe logs 表的 schema 定义(如果存在),然后通过 Drizzle 的 getTableName() 或类似机制引用表名:

+import { providerEndpointProbeLogs } from "@/drizzle/schema";
+import { getTableName } from "drizzle-orm";

然后在 SQL 中使用:

-      FROM provider_endpoint_probe_logs
+      FROM ${sql.identifier(getTableName(providerEndpointProbeLogs))}

如果 Drizzle 版本不支持 getTableName,可退而求其次添加注释:

// Table name must match: providerEndpointProbeLogs in `@/drizzle/schema`
FROM provider_endpoint_probe_logs

As per coding guidelines: "Use Drizzle ORM for data access in the repository layer".


9-13: toDate 静默回退到 new Date() 可能掩盖数据异常。

value 既不是 Datestring 也不是 number 时,返回当前时间。这意味着如果数据库返回了意外类型(例如 null),日志的 createdAt 会被静默替换为当前时间,而不是抛出或记录警告。

对于探测日志的 createdAt 字段,时间准确性较为重要。建议至少记录 debug 日志,或在调用侧对 null 值进行预处理。

src/lib/endpoint-circuit-breaker.ts (1)

149-161: Redis 状态同步仅在 circuitState 变化时更新内存。

Line 151 的条件 redisState.circuitState !== existingHealth.circuitState 意味着如果 circuitState 相同但 failureCount 不同(例如另一个实例累积了更多失败),内存中的 failureCount 不会从 Redis 同步。对于管理页面展示场景,这可能导致不同实例上显示的 failureCount 不一致。

当前实现与 getOrCreateHealth(line 62)保持了一致,且 forceRefresh 会先清除 loadedFromRedis 条目使得下次 needsRefresh 判断为 true,但由于 line 151 的条件限制,即使 forceRefresh 也不会更新 same-state 的 failureCount。

如果多实例场景下展示准确的 failureCount 是预期需求,建议放宽为:

-          if (!existingHealth || redisState.circuitState !== existingHealth.circuitState) {
+          if (!existingHealth
+            || redisState.circuitState !== existingHealth.circuitState
+            || redisState.failureCount !== existingHealth.failureCount) {

如果当前行为是有意的(减少不必要的对象分配),可以忽略此建议。

tests/unit/actions/provider-endpoints.test.ts (1)

13-14: 批量仓库 mock 已声明但未在任何测试中断言使用。

findProviderEndpointProbeLogsBatchMockfindVendorTypeEndpointStatsBatchMock 虽然作为模块 mock 注册是合理的(避免导入解析失败),但当前无任何测试用例验证其调用行为。如果这些批量仓库函数已在 action 层被使用,建议补充相应的测试覆盖。

#!/bin/bash
# 验证 findProviderEndpointProbeLogsBatch 和 findVendorTypeEndpointStatsBatch 在 action 层是否有实际调用
rg -n --type=ts 'findProviderEndpointProbeLogsBatch|findVendorTypeEndpointStatsBatch' -g '!**/test*' -g '!**/*.test.*'

Also applies to: 57-60

src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx (1)

141-158: 批量分片获取熔断状态逻辑正确,建议提取分片工具函数。

500 条为单位的分片 + Promise.all 并行拉取模式在本文件和 endpoint-latency-sparkline.tsx 中均有出现(相同的 MAX_ENDPOINT_IDS_PER_BATCH 和 chunking 循环)。可以考虑提取为通用工具函数以减少重复。

♻️ 可选:提取通用分片工具
// e.g. in `@/lib/utils/chunk.ts`
export function chunkArray<T>(arr: T[], size: number): T[][] {
  const chunks: T[][] = [];
  for (let i = 0; i < arr.length; i += size) {
    chunks.push(arr.slice(i, i + size));
  }
  return chunks;
}

然后在两处分别引用:

-      const MAX_ENDPOINT_IDS_PER_BATCH = 500;
-      const chunks: number[][] = [];
-      for (let index = 0; index < endpointIds.length; index += MAX_ENDPOINT_IDS_PER_BATCH) {
-        chunks.push(endpointIds.slice(index, index + MAX_ENDPOINT_IDS_PER_BATCH));
-      }
+      const chunks = chunkArray(endpointIds, 500);
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (4)

48-91: normalizeProbeLogsByEndpointId 中对 logs 数组元素未做运行时校验。

Line 58、72、85 处均将 logs 直接断言为 ProbeLog[],但未验证数组内部元素是否确实包含 oklatencyMs 字段。如果后端返回了非预期结构,下游 .map() 会产生 undefined 值而非报错,导致 sparkline 渲染异常但不会抛出明显错误(静默失败)。

考虑到这是性能优化代码且后端格式相对可控,这属于防御性编程建议,可以后续再做。


93-142: isBatchProbeLogsEndpointAvailable 状态一旦为 false 则整个页面生命周期内永不恢复。

当遇到 404 后,该模块级变量被置为 false,后续所有批量请求均跳过,直到用户刷新页面。这在前后端版本切换场景下是合理的降级策略(PR 描述中也有说明),但如果用户长时间不刷新页面而后端已升级,将持续使用低效的逐条请求模式。

如果需要改进,可以考虑加一个时间窗口(如 5 分钟后重试一次 batch):

♻️ 可选:添加重试时间窗口
-let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let batchProbeLogsDisabledAt: number | undefined;
+const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000;

 async function tryFetchBatchProbeLogsByEndpointIds(
   endpointIds: number[],
   limit: number
 ): Promise<Record<number, ProbeLog[]> | null> {
   if (endpointIds.length <= 1) return null;
-  if (isBatchProbeLogsEndpointAvailable === false) return null;
+  if (isBatchProbeLogsEndpointAvailable === false) {
+    if (batchProbeLogsDisabledAt && Date.now() - batchProbeLogsDisabledAt < BATCH_RETRY_INTERVAL_MS) {
+      return null;
+    }
+    isBatchProbeLogsEndpointAvailable = undefined;
+  }
   if (process.env.NODE_ENV === "test") return null;

在 404 分支也记录时间:

     if (res.status === 404) {
       isBatchProbeLogsEndpointAvailable = false;
+      batchProbeLogsDisabledAt = Date.now();
       return null;
     }

112-135: 批量分片采用串行处理,与 provider-endpoints-table.tsx 中的并行策略不一致。

此处 for (const chunk of chunks) 是串行的,而 provider-endpoints-table.tsx 中熔断状态的分片使用 Promise.all(chunks.map(...)) 并行。对于 probe logs 数据量较大的场景,串行可能是有意为之(减少服务端压力),但值得在注释中说明理由,避免后续维护者困惑。

此外,Line 130 处如果某个中间 chunk 的 normalized 返回 null,前面已成功处理的 chunk 结果会被丢弃,整体降级为逐条请求(这些端点会被重复拉取)。影响不大但值得注意。


183-227: ProbeLogsBatcher 微批量合并设计合理,有一个小的防御性建议。

10ms 的 setTimeout 窗口 + 按 limit 分组 + snapshot-then-clear 的刷新策略都很好。不过 Line 199 的 void this.flush() 如果 flush()Promise.all 外层发生非预期异常(虽然极不可能),会导致 unhandled promise rejection。可考虑加一个 .catch

♻️ 可选:添加顶层 catch 防护
-      this.flushTimer = setTimeout(() => {
-        this.flushTimer = null;
-        void this.flush();
-      }, delayMs);
+      this.flushTimer = setTimeout(() => {
+        this.flushTimer = null;
+        this.flush().catch(() => {});
+      }, delayMs);
src/repository/provider.ts (3)

870-876: 考虑导出 ProviderStatisticsRow 类型

getProviderStatistics 是导出函数,但其返回类型 ProviderStatisticsRow 未导出。调用方无法按名称引用该类型,只能通过 ReturnType<> 等方式推断,不利于类型复用。

建议修改
-type ProviderStatisticsRow = {
+export type ProviderStatisticsRow = {
   id: number;
   today_cost: string;
   today_calls: number;
   last_call_time: Date | null;
   last_call_model: string | null;
 };

886-898: 缓存对并发请求无去重(thundering herd)

当缓存过期瞬间,若多个并发请求同时到达 getProviderStatistics,它们都会通过缓存检查(expiresAt > now 为 false),各自发起独立的 DB 查询。虽然 10 秒 TTL 下概率较低,但在高频轮询场景下仍可能导致短暂的查询峰值。

如果未来观察到此类峰值,可考虑使用 Promise 级别的去重(即将 in-flight 查询的 Promise 缓存,后续请求复用同一 Promise),而非仅缓存结果数据。当前实现在现有规模下应该是可接受的。


903-908: bounds CTE 中 AT TIME ZONE 转换逻辑正确,但重复三次可简化

DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone} 模式重复了三次(today_starttomorrow_startlast7_start)。可以先定义 today_start,然后用算术表达式派生其余两个,减少重复:

简化建议
       WITH bounds AS (
         SELECT
           (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start,
-          (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start,
-          (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) - INTERVAL '7 days' AS last7_start
+          (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start
+      ),
+      bounds_ext AS (
+        SELECT *, today_start - INTERVAL '7 days' AS last7_start FROM bounds
       ),

或者保持单 CTE,利用 PostgreSQL 的列引用能力(需注意 CTE 中不能直接引用同级列)。当前写法在功能上没有问题,仅是可读性建议。

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This is a well-structured performance optimization PR that addresses N+1 query patterns and request storms in the Providers management UI. The changes introduce batch APIs, micro-batching for probe logs, and lazy loading via intersection observers.

PR Size: L

  • Lines changed: 1,840 (1,556 additions, 284 deletions)
  • Files changed: 23

Note: As a Large PR, consider whether this could be split into smaller, more focused changes for easier review:

  1. Backend batch APIs (repository + actions)
  2. Frontend batching/deferred loading hooks
  3. UI component updates

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean (graceful fallbacks are intentional)
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Adequate (tests updated for new batch functions)
  • Code clarity - Good

Notable Design Decisions

  1. Micro-batching pattern (ProbeLogsBatcher, requestVendorTypeEndpointStatsBatched): Uses module-level state with setTimeout debouncing to coalesce rapid requests. This is a valid pattern for reducing request storms.

  2. Concurrent worker pattern in fetchProbeLogsByEndpointIds: The idx increment is synchronous before any await, making it safe in JavaScript's single-threaded execution model.

  3. Graceful degradation: Empty catch blocks in fallback paths intentionally suppress errors to allow fallback to individual API calls when batch endpoints are unavailable (e.g., during rolling deployments).

  4. In-memory caches: TTL-based caches in statistics.ts and usage-logs.ts reduce DB load for frequently accessed data. Cache invalidation is handled via TTL expiration.


Automated review by Claude AI

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@src/app/`[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:
- Around line 48-91: normalizeProbeLogsByEndpointId 未验证数组内元素结构就把数组断言为
ProbeLog[](在 Array.isArray(data) 分支、logsByEndpointId 分支和 items
分支),会把畸形数据传到下游。请在这三个分支里对 logs 数组做最小结构校验:确保数组且至少检查第一个或每个元素包含必须字段(如 ok,
latencyMs/timestamp 等),用 filter 过滤掉不符合结构的条目并只在通过校验后才赋值到
map(或将无效/缺失字段的条目替换为带默认值的安全对象),这样 downstream 的 SparkPoint 映射就不会收到 undefined 字段。

In `@src/lib/hooks/use-in-view-once.ts`:
- Around line 77-97: The effect in use-in-view-once.ts can return early when
ref.current is null (delayed mount) and never re-run because dependencies omit
the element; update the logic so the effect depends on the actual element (or
use a callback ref) to ensure the observer is created when the element mounts:
either include ref.current (or a local state like observedEl set by a ref
callback) in the effect dependency array and use that element for
observer.observe, or refactor the hook to expose/accept a callback ref that
assigns the element to state and triggers the IntersectionObserver creation;
make sure to still guard for test/IntersectionObserver absence, call
setIsInView(true) when appropriate, and disconnect the observer in the cleanup.
🧹 Nitpick comments (7)
src/repository/provider.ts (2)

988-992: 缓存 expiresAt 使用了查询前捕获的 now

now 在第 897 行捕获(查询执行之前),但在第 990 行用于计算缓存过期时间。如果查询耗时较长(例如慢查询场景),缓存的有效 TTL 会被缩短。建议在写入缓存时使用新的时间戳:

建议修改
       providerStatisticsCache = {
         timezone,
-        expiresAt: now + PROVIDER_STATISTICS_CACHE_TTL_MS,
+        expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS,
         data,
       };

920-962: final_provider_id 的 CASE 表达式在两个 CTE 中重复

provider_statslatest_call 中的 CASE WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id ELSE (provider_chain->-1->>'id')::int END 完全相同。可考虑提取为一个公共 CTE(如 resolved_requests),在其中计算 final_provider_id 后供后续 CTE 引用,减少维护时两处逻辑不一致的风险。

src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)

121-127: 查询失效模式合理,可考虑抽取复用。

新增的 providers-healthproviders-statistics 失效逻辑在 create / delete / edit / toggle 四处重复出现(toggle 合理地省略了 statistics)。当前规模可接受,但若后续再增加 query key,维护成本会上升。可考虑抽取一个 invalidateProviderQueries(scope) 辅助函数统一管理。

示例:抽取失效辅助函数
// 在组件文件顶部或独立 util 中
function invalidateProviderQueries(
  queryClient: ReturnType<typeof useQueryClient>,
  scope: "full" | "toggle" = "full",
) {
  queryClient.invalidateQueries({ queryKey: ["providers"] });
  queryClient.invalidateQueries({ queryKey: ["providers-health"] });
  queryClient.invalidateQueries({ queryKey: ["provider-vendors"] });
  if (scope === "full") {
    queryClient.invalidateQueries({ queryKey: ["providers-statistics"] });
  }
}
src/lib/hooks/use-in-view-once.ts (1)

53-64: 在渲染期间直接读写 ref —— React 19 严格模式下存在隐患。

useStableIntersectionObserverOptions 在渲染阶段读写 stableOptionsRef.current(第 59-60 行)。React 19 严格模式下要求渲染函数为纯函数,渲染期间的 ref 读写虽然在实践中通常可行,但已被 React 官方文档标记为不推荐。如果 Strict Mode 的 double-render 导致比较在两次渲染间产生不同结果,可能出现细微问题。

可选方案:通过 useRef + useEffect 组合延迟更新,或使用 useMemo 配合序列化 key 来保持稳定引用。当前实际使用场景(options 几乎不变)下风险极低,标记为可选改进。

src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (3)

93-93: 模块级可变状态 isBatchProbeLogsEndpointAvailable 不可恢复。

404 触发后 isBatchProbeLogsEndpointAvailable 被永久置为 false(直到页面刷新)。如果后端临时返回 404(部署间隙、蓝绿切换等),用户需要刷新页面才能恢复批量路径。可考虑加一个带 TTL 的重置机制,比如 5 分钟后重新尝试批量端点。

添加 TTL 重置的示例
-let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let batchDisabledAt: number | undefined;
+const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000;
+
+function isBatchDisabled(): boolean {
+  if (isBatchProbeLogsEndpointAvailable === false) {
+    if (batchDisabledAt && Date.now() - batchDisabledAt > BATCH_RETRY_INTERVAL_MS) {
+      isBatchProbeLogsEndpointAvailable = undefined;
+      batchDisabledAt = undefined;
+      return false;
+    }
+    return true;
+  }
+  return false;
+}

然后在 404 处理中记录时间:

       if (res.status === 404) {
         isBatchProbeLogsEndpointAvailable = false;
+        batchDisabledAt = Date.now();

95-206: 约 250 行的批量数据获取逻辑应考虑抽取到独立模块。

tryFetchBatchProbeLogsByEndpointIdsfetchProbeLogsByEndpointIdsIndividuallyfetchProbeLogsByEndpointIdsnormalizeProbeLogsByEndpointId 以及 ProbeLogsBatcher 类构成了完整的数据获取层,与 UI 组件(sparkline 渲染)职责不同。将其抽取到如 @/lib/probe-logs-batcher.ts 有助于:

  • 独立测试批量/降级/合并逻辑
  • 在其他组件中复用 batcher
  • 降低当前文件的认知负荷

Also applies to: 208-236, 238-245, 247-307


216-232: 并发 worker 模式正确但依赖 JS 单线程语义——值得加注释说明。

idx 变量在多个 async worker 间共享,读取和自增发生在同一同步代码段(await 之前),因此在 JS 事件循环模型下是安全的。但这一模式不够直观,建议加一行注释说明其安全性依赖。

@tesgth032
Copy link
Contributor Author

跟进 CodeRabbit review 做了一轮小幅补齐(不改变现有 UX),主要是把一些 corner case 和一致性问题收敛掉:

  • Providers 统计:移除 src/repository/provider.ts 中的 emoji 注释;db.execute() 结果改为 Array.from() 后再类型化;并增加 in-flight 去重,降低缓存过期瞬间的并发峰值。
  • useInViewOnce:对 options 做语义级稳定化,避免调用方每次 render 传入新对象导致 IntersectionObserver 反复重建。
  • EndpointLatencySparkline:定时触发 flush() 增加顶层 catch;batch 分片失败改为 best-effort(保留已成功 chunk,仅对失败/缺失的 endpointId 做单点回退),减少重复拉取与请求风暴放大。
  • 创建 Provider Key:补齐失效 providers-health / providers-statistics(与编辑流程一致,避免同页缓存不一致)。

本地已验证:npm run lintnpm run typechecknpm run testnpm run build 均通过;CI checks 也已全绿。

@tesgth032
Copy link
Contributor Author

跟进 CodeRabbit 最新两条 actionable,并顺手修正一个小的缓存 TTL 细节(不改变现有 UX/接口语义):

  • EndpointLatencySparklinenormalizeProbeLogsByEndpointId() 对 logs 数组做最小结构校验(过滤畸形条目),并在单条 fallback 路径复用同一归一化逻辑,避免下游 sparkline 处理到缺失字段导致异常。
  • useInViewOnce:改为 callback ref + element state,使得当元素延迟挂载时 effect 会在元素出现后正确创建 observer(test/无 IntersectionObserver 的 fast-path 行为保持不变)。
  • getProviderStatistics:写入缓存时 expiresAt 使用 Date.now(),避免慢查询场景下缓存有效期被缩短。

本地与 CI 均已通过(build/lint/typecheck/test 全绿),CodeRabbit 当前已回到 Approved。

@tesgth032
Copy link
Contributor Author

补齐一轮 #779 相关的“热路径性能 + 错误性质 bug 收敛 + 安全收敛”(尽量不改变现有 UX/接口语义):

  • 运行时端点选择:getPreferredProviderEndpoints / getEndpointFilterStats 改为一次性批量读取端点熔断状态(getAllEndpointHealthStatusAsync),避免逐端点 Redis 往返放大。
  • probe 写入容错:recordProviderEndpointProbeResult 先更新 endpoint snapshot;若端点在探测过程中被并发删除则静默忽略(不再因为 FK 失败导致 probe scheduler/手动探测异常外溢)。
  • 端点变更一致性:编辑/删除/手动测活成功后 best-effort resetEndpointCircuit,减少“测活成功但仍被熔断”导致的困惑。
  • 索引迁移:新增 idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active(仅新增 index,无表结构变更),Docker 默认 AUTO_MIGRATE=true 升级无感。
  • 安全收敛:src/repository/provider-endpoints-batch.ts 移除顶层 "use server",改为 server-only(避免误暴露为 Server Action)。

本地已验证:bun run lint(仅既有 warnings)、bun run lint:fixbun run typecheckbun run testbun run buildbun run db:generate

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (1)

370-374: ⚠️ Potential issue | 🟠 Major

创建成功仅失效 provider-vendors 缓存符合 PR 目标,但编辑路径存在缺陷:编辑操作无形式级别的缓存失效,完全依赖可选的 onSuccess 回调,若回调执行失败(第 378-380 行仅记录错误)将导致缓存得不到刷新。

建议改进:要么为编辑路径添加与创建路径一致的形式级别失效(如 queryClient.invalidateQueries({ queryKey: ["providers"] })),要么将 onSuccess 改为必需参数,确保缓存刷新的可靠性。

tests/unit/actions/total-usage-semantics.test.ts (1)

48-52: ⚠️ Potential issue | 🟡 Minor

缺少 SessionTracker.getUserSessionCount 的 mock

getMyQuota 现在调用 getUserConcurrentSessions(userId)SessionTracker.getUserSessionCount(userId),但此处 SessionTracker mock 仅提供 getKeySessionCount,没有 getUserSessionCount。虽然 getUserConcurrentSessions 内部有 try-catch 兜底返回 0,测试不会报错,但这属于"静默吞错"而非正确的 mock 行为。

🛡️ 建议补充 mock
 vi.mock("@/lib/session-tracker", () => ({
   SessionTracker: {
     getKeySessionCount: (...args: unknown[]) => getKeySessionCountMock(...args),
+    getUserSessionCount: vi.fn().mockResolvedValue(0),
   },
 }));
🤖 Fix all issues with AI agents
In `@src/actions/provider-endpoints.ts`:
- Around line 567-583: The code calls resetEndpointCircuitState again after a
successful probe, creating redundant Redis round-trips because
probeProviderEndpointAndRecordByEndpoint already calls resetEndpointCircuit when
appropriate; remove the entire conditional block that checks result.ok and tries
to call resetEndpointCircuitState (the try/catch logging calling
resetEndpointCircuitState with endpoint.id) from
src/actions/provider-endpoints.ts so the single reset path inside
probeProviderEndpointAndRecordByEndpoint (which invokes resetEndpointCircuit) is
the only place handling circuit resets.

In `@src/repository/statistics.ts`:
- Around line 984-1019: The outer WHERE uses scanEnd as an upper bound which can
truncate the costTotal aggregate (filtered only by createdAt >= cutoffDate); fix
by computing scanEnd to include the current time so it cannot be earlier than
now — i.e. set scanEnd = new Date(Math.max(..., Date.now())) when building
scanEnd (the same pattern should be applied in the analogous
sumKeyQuotaCostsById logic); update uses of scanEnd (and any other range
end-time max calculations) so costTotal (and related totals) are not
artificially upper-bounded by stale range endTimes.
🧹 Nitpick comments (8)
src/repository/usage-logs.ts (3)

380-417: COUNT 和数据查询可以并行执行

findUsageLogsForKeySlim 中 count 查询(line 380)和分页数据查询(line 388)是顺序执行的,但两者相互独立,可以用 Promise.all 并行化以减少延迟。

♻️ 建议使用 Promise.all 并行查询
- const [countResult] = await db
-   .select({ totalRows: sql<number>`count(*)::double precision` })
-   .from(messageRequest)
-   .where(and(...conditions));
-
- const total = countResult?.totalRows ?? 0;
-
- const offset = (safePage - 1) * safePageSize;
- const results = await db
-   .select({
-     id: messageRequest.id,
-     ...
-   })
-   .from(messageRequest)
-   .where(and(...conditions))
-   .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id))
-   .limit(safePageSize)
-   .offset(offset);
+ const offset = (safePage - 1) * safePageSize;
+ const [countResult, results] = await Promise.all([
+   db
+     .select({ totalRows: sql<number>`count(*)::double precision` })
+     .from(messageRequest)
+     .where(and(...conditions)),
+   db
+     .select({
+       id: messageRequest.id,
+       // ... 其余字段
+     })
+     .from(messageRequest)
+     .where(and(...conditions))
+     .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id))
+     .limit(safePageSize)
+     .offset(offset),
+ ]);
+
+ const total = countResult?.[0]?.totalRows ?? 0;

420-444: 缓存写入时 expiresAt 使用调用方传入的 now 而非当前时间

根据 PR 评论中的讨论,写入缓存时 expiresAt 应使用 Date.now() 而非调用方在查询前捕获的 now,以避免慢查询场景下缓存有效期被缩短。当前实现中 setDistinctKeyOptionsCache 直接使用调用方传入的 now(在 DB 查询之前捕获),可能导致有效 TTL 减少。

♻️ 建议在 setDistinctKeyOptionsCache 内部使用 Date.now()
 function setDistinctKeyOptionsCache(
   cache: Map<string, { data: string[]; expiresAt: number }>,
   key: string,
   data: string[],
-  now: number
 ): void {
+  const now = Date.now();
   if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) {
     for (const [k, v] of cache) {
       if (v.expiresAt <= now) {
         cache.delete(k);
       }
     }
 
     if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) {
       cache.clear();
     }
   }
 
   cache.set(key, { data, expiresAt: now + DISTINCT_KEY_OPTIONS_CACHE_TTL_MS });
 }

284-301: UsageLogSlimFiltersUsageLogSlimRow 未导出

findUsageLogsForKeySlim 是导出函数,但其参数类型 UsageLogSlimFilters 和返回值中的 UsageLogSlimRow 没有导出。虽然 TypeScript 结构化类型允许调用方传入字面量对象,但在外部代码需要引用这些类型时(如在测试 mock 中构造类型安全的参数)会受限。AI 摘要也提到这些是新增的公共类型。

♻️ 建议导出这两个接口
-interface UsageLogSlimFilters {
+export interface UsageLogSlimFilters {
   keyString: string;
   ...
 }

-interface UsageLogSlimRow {
+export interface UsageLogSlimRow {
   id: number;
   ...
 }

Also applies to: 303-319

src/actions/my-usage.ts (1)

600-648: keyBreakdownuserBreakdown 查询字段不一致

keyBreakdown 查询包含 cacheCreation5mTokenscacheCreation1hTokens(line 611-612),但 userBreakdown 查询没有这两个字段(line 627-636)。虽然这两个字段仅用于 summary 级别的统计(从 keyBreakdown reduce 而来)、不影响 userModelBreakdown 的映射逻辑,但两个并行查询的 select 列表不一致容易引起混淆,且如果后续需要在 userModelBreakdown 中也展示这些字段则需要再改。

src/lib/provider-endpoints/endpoint-selector.ts (2)

41-45: findEnabledProviderEndpointsByVendorAndType 已在 DB 层过滤 isEnableddeletedAt,应用层过滤冗余

findEnabledProviderEndpointsByVendorAndType 的查询条件已包含 is_enabled = true AND deleted_at IS NULL(见 src/repository/provider-endpoints.ts line 823-828),line 45 的 e.isEnabled && !e.deletedAt 检查重复。仅 excludeSet.has(e.id) 过滤是必要的。

♻️ 简化过滤逻辑
- const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id));
+ const filtered = excludeSet.size > 0
+   ? endpoints.filter((e) => !excludeSet.has(e.id))
+   : endpoints;

81-98: getEndpointFilterStats 中 enabled 端点的过滤执行了两次

Line 83 和 line 90 对同一数组执行了相同的 e.isEnabled && !e.deletedAt 过滤。可以复用一次过滤的结果。

♻️ 消除重复过滤
   const endpoints = await findProviderEndpointsByVendorAndType(input.vendorId, input.providerType);
   const total = endpoints.length;
-  const enabled = endpoints.filter((e) => e.isEnabled && !e.deletedAt).length;
+  const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt);
+  const enabled = enabledEndpoints.length;
 
   // When endpoint circuit breaker is disabled, no endpoints can be circuit-open
   if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) {
     return { total, enabled, circuitOpen: 0, available: enabled };
   }
 
-  const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt);
   if (enabledEndpoints.length === 0) {
     return { total, enabled: 0, circuitOpen: 0, available: 0 };
   }
src/repository/provider-endpoints.ts (1)

793-833: 建议复用 providerEndpointSelectFields 常量减少重复。

findEnabledProviderEndpointsByVendorAndType.select(...) 字段与 findProviderEndpointsByVendorAndType(第 762-779 行)以及第 209 行定义的 providerEndpointSelectFields 完全相同。直接复用该常量可减少维护负担。

♻️ 建议的改动
   const rows = await db
-    .select({
-      id: providerEndpoints.id,
-      vendorId: providerEndpoints.vendorId,
-      providerType: providerEndpoints.providerType,
-      url: providerEndpoints.url,
-      label: providerEndpoints.label,
-      sortOrder: providerEndpoints.sortOrder,
-      isEnabled: providerEndpoints.isEnabled,
-      lastProbedAt: providerEndpoints.lastProbedAt,
-      lastProbeOk: providerEndpoints.lastProbeOk,
-      lastProbeStatusCode: providerEndpoints.lastProbeStatusCode,
-      lastProbeLatencyMs: providerEndpoints.lastProbeLatencyMs,
-      lastProbeErrorType: providerEndpoints.lastProbeErrorType,
-      lastProbeErrorMessage: providerEndpoints.lastProbeErrorMessage,
-      createdAt: providerEndpoints.createdAt,
-      updatedAt: providerEndpoints.updatedAt,
-      deletedAt: providerEndpoints.deletedAt,
-    })
+    .select(providerEndpointSelectFields)
     .from(providerEndpoints)
tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (1)

108-131: 健康状态 mock 类型定义在多个测试块中重复。

getAllStatusMock 的返回值类型(包含 failureCount, circuitState 等字段的 Record)在第 108-131、206-229、291-314 行重复了三次。可以考虑将类型提取为文件级别的 type alias 或抽取一个 helper 函数来构造 mock,减少重复。不过对测试代码来说可读性优先,这只是一个可选优化。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@tesgth032
Copy link
Contributor Author

跟进最新 AI review(CodeRabbit / Greptile / Gemini)再收敛一轮(尽量不改变现有 UX/接口语义):

  • CodeRabbit 2806945634:移除 probeProviderEndpoint 成功后的重复 resetEndpointCircuitState(内部 probe 已处理),减少冗余 Redis 往返。
  • CodeRabbit 2806945635:配额/汇总扫描上界 scanEnd 保证 >= now,避免 costTotal 被旧 range.endTime 截断。
  • Greptile:editProviderEndpoint 仅在 URL 真实变更或从 disabled→enabled 时才重置熔断,减少无意义 Redis 写入。
  • Greptile:findUsageLogsForKeySlim 补齐 EXCLUDE_WARMUP_CONDITIONgetMyUsageLogs 增加 sessionId 过滤并透传;probe logs batch 对 endpointIdsNumber.isSafeInteger 防御过滤。
  • Gemini 安全提示:本 PR 涉及的 repository 模块改为 server-onlyprovider.ts / usage-logs.ts / statistics.ts),避免误当作 Server Action 暴露。

本地已验证:bun run buildbun run lint(仅既有 warnings)、bun run lint:fixbun run typecheckbun run test

@coderabbitai 上次 changes requested 的两点已覆盖修复,如方便请重新 review/更新状态。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

@tesgth032 感谢您如此详尽的更新!我看到您已经系统性地处理了来自 CodeRabbit、Greptile 和 Gemini 的反馈意见,包括:

✅ 移除冗余的熔断重置调用
✅ 修正配额扫描边界确保 costTotal 完整性
✅ 优化熔断重置的触发条件
✅ 补齐过滤条件和输入验证
✅ 加强 repository 模块的安全隔离(server-only

而且您已经完成了完整的本地验证(build、lint、typecheck、test)。

我现在会重新审查本 PR 的所有变更。

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

36 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@tesgth032
Copy link
Contributor Author

本次在 PR #782 上继续补齐/修正了几处细节(均与 #779 性能/稳定性相关,尽量不改变现有 UX):

  • DB 迁移:将新增索引迁移调整为 0068_flawless_khan,避免与 upstream 0067_gorgeous_mulholland_black 编号冲突;并补齐 drizzle/meta/_journal.json + 0068_snapshot.json,确保 Docker 默认 AUTO_MIGRATE=true 时可无感自动建索引。
  • Providers UI:移除 endpointIds.toSorted(...)(部分浏览器不支持),改为 memo 化 slice().sort().join() 生成稳定 queryKey,避免兼容性问题与不必要的重复分配。
  • Usage/Quota 语义:sumUserQuotaCosts / sumKeyQuotaCostsByIdInfinity 明确按 all-time 语义处理(不再隐式回退 365 天截断),并新增仓储单测覆盖。
  • 维护性小优化:getEndpointFilterStats 去除重复过滤;findEnabledProviderEndpointsByVendorAndType 复用 providerEndpointSelectFields 常量。

本地验证:bun run test / bun run typecheck / bun run lint / bun run build 均已通过。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

36 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@tesgth032
Copy link
Contributor Author

本次更新(同步 dev + 跟进少量 review 关注点):

  • merge:同步 origin/dev(v0.5.7 release / VERSION & CHANGELOG),保证 PR 基于最新 dev
  • fix:getAllEndpointHealthStatusAsyncforceRefresh 下也同步 failureCount/halfOpenSuccessCount 等计数字段,避免多实例下展示漂移;补充单测覆盖
  • fix(security):findProviderEndpointProbeLogsBatch 改为 inArray 参数化条件,避免手工拼接 IN (...) 的注入/维护风险(行为不变)

本地已验证:bun run lint/typecheck/test/build;Actions/Greptile 均已通过。

@tesgth032
Copy link
Contributor Author

本次在 PR #782 上额外合入对 issue #781(Endpoint Health 旧端点/旧类型残留、持续 probe)的修复,尽量保持现有 UX:

  • fix(Provider Availability Endpoint #781): syncProviderEndpointOnProviderEdit 在“旧 endpoint 已无任何活跃 provider 引用”且发生 unique 冲突/next 已存在时,改为 soft-delete 旧 endpoint(避免旧 URL 长期残留并被 scheduler/runtime 继续探测/使用)
  • fix(Provider Availability Endpoint #781): provider 删除/批量删除时,如对应 (vendorId, providerType, url) 已无活跃 provider 引用,则同步 soft-delete 对应 endpoint,防止形成孤儿端点
  • perf/fix(Provider Availability Endpoint #781): findEnabledProviderEndpointsForProbing 增加 EXISTS(active providers for vendor/type) 过滤,避免无任何活跃 provider 的 vendor/type 仍被后台 probe
  • UX(Provider Availability Endpoint #781): Endpoint Health(dashboard availability)vendor/type 下拉改为仅基于“活跃 providers”生成(动态 providerTypes),避免展示已删除 provider 的历史 vendor/type

本地验证:bun run lint(仅既有 warnings)、bun run typecheckbun run testbun run build 均通过;Actions/Greptile 也已全部通过。

- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈
- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)
@tesgth032 tesgth032 force-pushed the issue/779-provider-performance branch from 2b31c19 to 9680cd9 Compare February 15, 2026 04:35
@tesgth032
Copy link
Contributor Author

补充提交(2026-02-15):

  • 修正 probe scheduler 的端点筛选:从 vendor/type/url join 调整为按 vendor/type 维度 gating(仅在存在启用 provider 的 vendor/type 下探测端点池),避免误漏端点池内“手动添加”的端点(这些端点运行时会参与 endpoint pool 选择,但未必与 providers.url 一一对应)。
  • Dashboard Endpoint Health:端点列表同样改为按 vendor/type 维度 gating(仅在存在启用 provider 的 vendor/type 下展示端点池),尽量保持端点池展示语义不变,同时避免孤儿 vendor/type 仍在 Dashboard 中出现。
  • 前端健壮性:ProbeGrid 展示端点名时不再直接 new URL(endpoint.url)(URL 非绝对/历史脏数据会抛异常),改为安全解析 hostname,失败则回退到原始 URL;lastProbedAt 非法值也会回退为 -,避免渲染崩溃。

本地已复跑:bun run lintbun run typecheckbun run testbun run build

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

72 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

72 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@tesgth032 tesgth032 changed the title Providers:性能与稳定性优化(#779) [未完成] Providers:性能与稳定性优化(#779) Feb 15, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

72 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@tesgth032
Copy link
Contributor Author

补充 follow-up(基于 AI review):

  • getProviderStatistics:移除 Promise.resolve().then(...),改为 async IIFE;in-flight 去重更直观稳健
  • EndpointLatencySparkline / ProviderEndpointHover:并发 worker 的索引递增改为 idx++(语义等价但更不易误读),并补充注释说明 useInViewOnce 单向语义,避免误判“refetch 闪烁/并发竞态”
  • provider-endpoints-batch:补充注释说明 VALUES 列表为 drizzle 参数化占位符拼接(避免被误判为 raw 注入)
  • 索引迁移:在 0069/0070/0072 顶部补充“标准 CREATE INDEX 可能阻塞写入”的升级提示,并在 PR「兼容/升级」章节补齐同样说明(对停写敏感可选择维护窗口或提前 CONCURRENTLY 预建同名索引)

@tesgth032
Copy link
Contributor Author

本次 push(6e3c10c3)前后,我已完整通读 PR 描述里由 AI 自动追加/修改的所有区块(Greptile Summary / Confidence / Important Files / Sequence Diagram / Last reviewed commit 等),并逐条核对当前未 resolved 的 AI review 意见。结论与处理如下(只涉及本 PR 引入/修改的代码):

  • src/repository/*.ts 相关“use server 会把 repository 暴露成公开 Server Action”的担忧:本 PR 新增/修改的 repository 文件均使用 import "server-only";,不会作为 Next.js Server Actions 暴露;对外暴露的管理 API 统一走 src/app/api/actions/[...route]/route.ts + src/actions/* 的鉴权。
  • endpoint-latency-sparkline.tsxfetch("/api/actions/providers/batchGetProviderEndpointProbeLogs"):这是项目内置的 actions 路由(非 Next.js Server Actions 的“硬编码公开 URL”);路由已在 src/app/api/actions/[...route]/route.ts 注册。
  • keyStringByIdCache 的淘汰策略:并未 clear();现为“先清过期、仍超限则淘汰最早 10%”并在命中时 delete+set 做 LRU-like bump,避免一次性清空引发击穿。
  • SQL 注入误报:provider-endpoints-batchVALUES 列表由 drizzle sql\` 参数化占位符拼接(sql`(${id})`),id` 不会作为 raw 字符串注入。
  • my-usage/sessionId 过滤:getMyUsageLogs -> findUsageLogsForKeySlim 已透传 sessionId,repository 内部做 trim() 后精确匹配;同时 slim 查询已包含 EXCLUDE_WARMUP_CONDITION
  • Dashboard Availability 的 focus/visibility 双触发刷新:已通过 refreshThrottled 节流统一入口,避免重复 refresh 链路放大。
  • Provider 统计 SQL 的时间边界:已按 Greptile 建议调整为“在本地 timestamp 上做 +interval,再 AT TIME ZONE 回到 timestamptz”,并补充 DST 注释,避免夏令时跨日偏移。

本地验证:bun run lint(仅仓库既有 useExhaustiveDependencies warnings)、bun run typecheckbun run testbun run build 均通过。

@ding113 ding113 changed the base branch from dev to refactor/provider-performance February 15, 2026 09:03
@ding113 ding113 changed the title Providers:性能与稳定性优化(#779) refactor(provider): improve provider page performance Feb 15, 2026
@ding113 ding113 merged commit 54afd85 into ding113:refactor/provider-performance Feb 15, 2026
9 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 15, 2026
ding113 pushed a commit that referenced this pull request Feb 15, 2026
* fix: Providers 管理页批量化端点统计与测活日志

* perf: 优化 provider 统计与 my-usage 查询性能

* perf: Providers 管理页移除 refresh 放大器并按需加载端点区块

* fix: 跟进 review 补齐 Providers 批量与统计健壮性

* fix: 跟进 CodeRabbit 修复 in-view 与测活数据校验

* perf: 补齐 in-view 稳定化与 batch 404 复原

* perf: my-usage 配额/汇总减少 DB 往返

* perf(providers): 端点池热路径批量熔断查询与索引迁移 (#779)

- 运行时端点选择与严格审计统计改为批量读取端点熔断状态,减少 Redis 往返\n- probe 写入在端点并发删除时静默忽略,避免 FK 失败导致任务中断\n- 新增索引迁移:idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active\n- repository 批量查询模块改为 server-only,避免误暴露为 Server Action

* fix: 跟进 review 去重熔断 reset 与 scanEnd (#779)

* fix: 精确熔断 reset + repo 使用 server-only (#779)

* fix: my-usage 补齐 sessionId/warmup 过滤 (#779)

* perf: provider 统计 in-flight 去重更稳健 (#779)

* fix: ProviderForm 统一失效相关缓存 (#779)

* fix: Providers/Usage 细节修正与用例补齐 (#779)

* style: biome 格式化补齐 (#779)

* fix(#779): 熔断状态同步与 probeLogs 批量查询改进

* fix(#781): 清理孤儿端点并修正 Endpoint Health

* perf: 优化 usage logs 与端点同步(#779/#781)

* refactor: 移除端点冗余过滤(#779)

* fix: 熔断状态批量查询仅覆盖启用端点(#779)

* fix: Provider 统计兼容脏数据并稳定 probe logs 排序(#779)

* perf: 禁用 Providers 重查询的 window focus 自动刷新(#779)

* fix: 多实例熔断状态定期同步,并修复 backfill 遗留软删除端点(#779/#781)

* perf: probe scheduler 仅探测启用 provider 的端点(#781)

* perf: ProviderForm 避免重复 refetch 并稳定 hover circuit key(#779)

* perf: 全局 QueryClient 策略与 usage/user 索引优化(#779)

* perf: 时区统计索引命中与批量删除优化(#779)

* perf: 降低 logs/users 页面无效重算

* fix(provider): endpoint pool 仅基于启用 provider

- sync/backfill/delete:引用判断与回填仅考虑 is_enabled=true 的 provider,避免 disabled provider 复活旧 endpoint
- updateProvider:provider 从禁用启用时确保端点存在
- Dashboard Endpoint Health:避免并发刷新覆盖用户切换,vendor/type 仅从启用 provider 推导
- probe logs 批量接口:滚动发布场景下部分 404 不全局禁用 batch
- 补齐 endpoint-selector 单测以匹配 findEnabled* 语义

* perf: Dashboard vendor/type 轻量查询与 usage logs 并行查询

* fix(migrate): advisory lock 串行迁移并移除 emoji 日志

* fix: endpoint hover 兜底并规范 batch probe logs SQL

* perf(settings/providers): 减少冗余刷新并复用 endpoint/circuit 缓存

* perf(probe/statistics): 修正 probe 锁/计数并收敛统计与 usage 扫描

* perf(probe/ui): 优化 probe 目标筛选 SQL 并减少 sparkline 闪烁

* fix(db): 修复 Drizzle snapshot 链

* fix(perf): 补强 Providers 批量与缓存一致性

- Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示)

* chore: format code (issue-779-provider-performance-23b338e)

* chore: 触发 CI 重跑

* fix(provider): 批量启用时补齐 endpoint pool

- batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断

* fix(perf): 收敛 Providers 刷新放大并优化探测/分页

* perf: 收敛 availability/probe 轮询并优化 my-usage (#779/#781)

- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈

* fix(ui): 恢复全局 react-query 默认配置

* fix(availability): 刷新 vendors 时清理旧 endpoint 选择

* perf: 补强 Providers 探测与 Usage Logs 性能

* perf(ui): useInViewOnce 共享 IntersectionObserver 降低资源占用

- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)

* perf: providers batch where 优化与 sparkline 降级并发修正

* perf: my-usage breakdown 补齐缓存字段并优化筛选缓存

* perf: 优化端点熔断 Redis 负载与探测候选

* fix(#781): Endpoint Health 仅展示启用 provider 引用端点

* 修正端点健康筛选并增强URL解析容错

* docs(provider-endpoints): 说明 keepPreviousWhenReferenced 语义

* perf(availability): EndpointTab 前后台切换节流刷新

* docs(availability): 补充 EndpointTab 刷新节流注释

* chore(review): 按 AI 审阅补齐注释并收敛细节

* fix: 修正 provider 统计 SQL 的 DST 日界

---------

Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ding113 added a commit that referenced this pull request Feb 15, 2026
* refactor(provider): improve provider page performance (#782)

* fix: Providers 管理页批量化端点统计与测活日志

* perf: 优化 provider 统计与 my-usage 查询性能

* perf: Providers 管理页移除 refresh 放大器并按需加载端点区块

* fix: 跟进 review 补齐 Providers 批量与统计健壮性

* fix: 跟进 CodeRabbit 修复 in-view 与测活数据校验

* perf: 补齐 in-view 稳定化与 batch 404 复原

* perf: my-usage 配额/汇总减少 DB 往返

* perf(providers): 端点池热路径批量熔断查询与索引迁移 (#779)

- 运行时端点选择与严格审计统计改为批量读取端点熔断状态,减少 Redis 往返\n- probe 写入在端点并发删除时静默忽略,避免 FK 失败导致任务中断\n- 新增索引迁移:idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active\n- repository 批量查询模块改为 server-only,避免误暴露为 Server Action

* fix: 跟进 review 去重熔断 reset 与 scanEnd (#779)

* fix: 精确熔断 reset + repo 使用 server-only (#779)

* fix: my-usage 补齐 sessionId/warmup 过滤 (#779)

* perf: provider 统计 in-flight 去重更稳健 (#779)

* fix: ProviderForm 统一失效相关缓存 (#779)

* fix: Providers/Usage 细节修正与用例补齐 (#779)

* style: biome 格式化补齐 (#779)

* fix(#779): 熔断状态同步与 probeLogs 批量查询改进

* fix(#781): 清理孤儿端点并修正 Endpoint Health

* perf: 优化 usage logs 与端点同步(#779/#781)

* refactor: 移除端点冗余过滤(#779)

* fix: 熔断状态批量查询仅覆盖启用端点(#779)

* fix: Provider 统计兼容脏数据并稳定 probe logs 排序(#779)

* perf: 禁用 Providers 重查询的 window focus 自动刷新(#779)

* fix: 多实例熔断状态定期同步,并修复 backfill 遗留软删除端点(#779/#781)

* perf: probe scheduler 仅探测启用 provider 的端点(#781)

* perf: ProviderForm 避免重复 refetch 并稳定 hover circuit key(#779)

* perf: 全局 QueryClient 策略与 usage/user 索引优化(#779)

* perf: 时区统计索引命中与批量删除优化(#779)

* perf: 降低 logs/users 页面无效重算

* fix(provider): endpoint pool 仅基于启用 provider

- sync/backfill/delete:引用判断与回填仅考虑 is_enabled=true 的 provider,避免 disabled provider 复活旧 endpoint
- updateProvider:provider 从禁用启用时确保端点存在
- Dashboard Endpoint Health:避免并发刷新覆盖用户切换,vendor/type 仅从启用 provider 推导
- probe logs 批量接口:滚动发布场景下部分 404 不全局禁用 batch
- 补齐 endpoint-selector 单测以匹配 findEnabled* 语义

* perf: Dashboard vendor/type 轻量查询与 usage logs 并行查询

* fix(migrate): advisory lock 串行迁移并移除 emoji 日志

* fix: endpoint hover 兜底并规范 batch probe logs SQL

* perf(settings/providers): 减少冗余刷新并复用 endpoint/circuit 缓存

* perf(probe/statistics): 修正 probe 锁/计数并收敛统计与 usage 扫描

* perf(probe/ui): 优化 probe 目标筛选 SQL 并减少 sparkline 闪烁

* fix(db): 修复 Drizzle snapshot 链

* fix(perf): 补强 Providers 批量与缓存一致性

- Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示)

* chore: format code (issue-779-provider-performance-23b338e)

* chore: 触发 CI 重跑

* fix(provider): 批量启用时补齐 endpoint pool

- batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断

* fix(perf): 收敛 Providers 刷新放大并优化探测/分页

* perf: 收敛 availability/probe 轮询并优化 my-usage (#779/#781)

- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈

* fix(ui): 恢复全局 react-query 默认配置

* fix(availability): 刷新 vendors 时清理旧 endpoint 选择

* perf: 补强 Providers 探测与 Usage Logs 性能

* perf(ui): useInViewOnce 共享 IntersectionObserver 降低资源占用

- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)

* perf: providers batch where 优化与 sparkline 降级并发修正

* perf: my-usage breakdown 补齐缓存字段并优化筛选缓存

* perf: 优化端点熔断 Redis 负载与探测候选

* fix(#781): Endpoint Health 仅展示启用 provider 引用端点

* 修正端点健康筛选并增强URL解析容错

* docs(provider-endpoints): 说明 keepPreviousWhenReferenced 语义

* perf(availability): EndpointTab 前后台切换节流刷新

* docs(availability): 补充 EndpointTab 刷新节流注释

* chore(review): 按 AI 审阅补齐注释并收敛细节

* fix: 修正 provider 统计 SQL 的 DST 日界

---------

Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* refactor: consolidate migrations, extract shared utilities, fix bugbot issues

Merge 6 index migrations (0068-0073) into single idempotent migration.
Extract reusable utilities from duplicated code across the codebase:

- TTLMap<K,V>: generic LRU+TTL cache replacing 3 inline implementations
- createAbortError: shared abort error factory from 2 components
- startLeaderLockKeepAlive: shared leader lock renewal from 2 schedulers
- ProbeLogsBatcher: data-fetching infra extracted from sparkline component
- buildUsageLogConditions: shared SQL filter builder from 3 query functions

Additional cleanup:
- Simplify useInViewOnce hook (remove unused options, keep shared observer pool)
- Remove dead code (sumKeyTotalCostById, unexport internal types)
- Hardcode env var defaults (ENDPOINT_CIRCUIT_HEALTH_CACHE_MAX_SIZE,
  ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS)
- Fix in-flight dedup race condition in getProviderStatistics
- Fix yesterday/today interval boundary inconsistency (lte -> lt)
- Add NaN guard for limitPerEndpoint in batch probe logs
- Add updatedAt to deleteProvider for audit consistency
- Log swallowed flush() errors in batchers instead of silently catching

* fix: resolve loading state reset and advisory lock client close errors

Remove silent option guard so vendor loading state always resets when
the request completes, preventing stale loading indicators. Wrap
advisory lock client.end() in try-catch to avoid unhandled errors
during connection teardown.

---------

Co-authored-by: tesgth032 <tesgth032@hotmail.com>
Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider area:UI enhancement New feature or request size/L Large PR (< 1000 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants