Skip to content

refactor(provider): improve provider page performance#789

Merged
ding113 merged 3 commits intodevfrom
refactor/provider-performance
Feb 15, 2026
Merged

refactor(provider): improve provider page performance#789
ding113 merged 3 commits intodevfrom
refactor/provider-performance

Conversation

@ding113
Copy link
Owner

@ding113 ding113 commented Feb 15, 2026

Greptile Summary

A comprehensive performance and correctness refactoring across the provider page, endpoint management, and usage statistics subsystems. The PR reduces database round-trips by consolidating N+1 query patterns into batch operations, adds targeted indexes on hot paths, introduces in-memory caching with TTL/LRU eviction, and improves multi-instance safety with advisory locks.

  • Query consolidation: Replaces multiple individual cost/stat queries with single FILTER-aggregate queries (sumUserQuotaCosts, sumKeyQuotaCostsById), batch circuit breaker health loading (getAllEndpointHealthStatusAsync), and batch probe log retrieval via LATERAL joins
  • Index optimization: Adds 11 new indexes targeting key+created_at pagination, distinct model/endpoint filters, provider vendor+type lookups, and endpoint selection hot paths; SQL queries refactored from ::date casts to range-based comparisons for index utilization
  • Caching layers: New TTLMap generic cache used for key-string lookups, usage log totals, distinct model/endpoint lists, and provider statistics; in-flight dedup prevents thundering herd on cache expiry
  • Endpoint lifecycle correctness: Provider delete/disable now cascades to soft-delete orphaned endpoints; endpoint deletion blocked when still referenced by enabled providers; backfill scoped to enabled providers only
  • Multi-instance safety: Database migrations wrapped in pg_advisory_lock; provider backfill uses withAdvisoryLock with skipIfLocked; drizzle migration created_at repair for journal consistency
  • Client-side optimization: ProbeLogsBatcher coalesces per-endpoint requests into batch API calls; useInViewOnce hook with shared IntersectionObserver defers off-screen data loading; removed redundant QueryClientProvider wrappers and router.refresh() calls
  • Probe scheduler improvements: Idle tick skip via computeNextDueAtMs, vendor+type scoped interval counting, in-memory probe result tracking to avoid stale scheduling decisions

Confidence Score: 4/5

  • This PR is well-structured with comprehensive test coverage and defensive error handling; the primary risk is the blocking index migration on high-write tables.
  • Score of 4 reflects: (1) thorough query optimization with correct SQL semantics, (2) proper in-flight dedup and caching patterns, (3) good defensive coding (FK violation handling, unique constraint retries, NOT EXISTS guards), (4) comprehensive test additions. Deducted one point for: the blocking index creation on message_request table which could cause production downtime, and the fragile ordering pattern in syncHealthFromRedisBatch where in-flight map is set after promise creation.
  • drizzle/0068_flaky_swarm.sql (blocking index creation on high-write table), src/lib/endpoint-circuit-breaker.ts (in-flight dedup ordering), src/repository/provider.ts (complex transaction logic in delete/batch operations)

Important Files Changed

Filename Overview
src/repository/provider.ts Major refactoring: in-memory cache + in-flight dedup for getProviderStatistics, cascade soft-delete of endpoints on provider delete, batch enable/disable with endpoint pool sync. SQL query restructured with bounds CTE for index-friendliness.
src/repository/provider-endpoints.ts Adds findEnabledProviderEndpointsByVendorAndType, findDashboardProviderEndpointsByVendorAndType, findEnabledProviderVendorTypePairs, vendor update enrichment, revive conflict handling in sync, and reorders probe result recording to check endpoint existence first.
src/repository/provider-endpoints-batch.ts New file: batch endpoint stats and probe logs queries using LATERAL joins and GROUP BY FILTER for efficient multi-endpoint data loading.
src/lib/endpoint-circuit-breaker.ts Adds batch health status loading (getAllEndpointHealthStatusAsync), LRU cache eviction, TTL-based Redis sync, in-flight dedup for batch loads, and optimization to delete default-closed states from Redis.
src/lib/provider-endpoints/endpoint-selector.ts Replaces N individual isEndpointCircuitOpen calls with single batch getAllEndpointHealthStatusAsync call. Uses findEnabledProviderEndpointsByVendorAndType to avoid filtering in application layer. Top-level getEnvConfig import replaces dynamic import.
src/lib/provider-endpoints/probe-logs-batcher.ts New client-side probe log batcher: coalesces per-endpoint probe log requests into batch API calls with progressive fallback to individual requests, abort signal support, and auto-disable on 404.
src/lib/provider-endpoints/probe-scheduler.ts Adds idle tick optimization via computeNextDueAtMs/updateNextWorkHints, refactors vendor counting to vendor+type key, extracts startLeaderLockKeepAlive to shared module, and updates probe results in-memory to avoid stale scheduling.
src/actions/provider-endpoints.ts Adds batch probe logs/stats/circuit info actions, endpoint existence check before edit, reference check before delete, cache invalidation on mutations, circuit reset on URL change/enable.
src/actions/my-usage.ts Consolidates quota cost queries into single sumKeyQuotaCostsById/sumUserQuotaCosts calls (N+1 elimination). Replaces separate aggregate+breakdown queries with single combined query using FILTER aggregates. Derives today-stats totals from breakdown loop.
src/repository/statistics.ts Adds sumUserQuotaCosts and sumKeyQuotaCostsById for consolidated multi-period cost queries via FILTER aggregates. Introduces TTLMap-based key string cache. Adds range-bounding predicates to all JOIN conditions in statistics queries for index utilization.
src/repository/usage-logs.ts Adds findUsageLogsForKeySlim (slim projection + cached total count). Extracts shared filter building to buildUsageLogConditions. Adds TTL caching for distinct models/endpoints. Conditionally joins keysTable only when keyId filter is present.
src/drizzle/schema.ts Adds targeted indexes: key+created_at+id for pagination, key+model/endpoint for distinct filters, provider vendor+type+url, provider endpoint pick-enabled composite, and users tags GIN index.
drizzle/0068_flaky_swarm.sql Migration: creates 11 new indexes on message_request, provider_endpoints, providers, keys, and users tables. Uses IF NOT EXISTS and includes a warning about write-blocking on high-write tables.
src/lib/migrate.ts Adds pg_advisory_lock around migrations for multi-instance safety. New withAdvisoryLock utility. Adds repairDrizzleMigrationsCreatedAt to fix journal timestamp mismatches. Changes from "use server" to import "server-only".

Flowchart

flowchart TD
    subgraph Client["Client-Side Optimization"]
        UI["Provider Page / Dashboard UI"]
        BATCH["ProbeLogsBatcher"]
        INVIEW["useInViewOnce Hook"]
        UI --> INVIEW
        INVIEW -->|visible| BATCH
        BATCH -->|coalesce requests| BATCHAPI["/api/actions batch endpoints"]
    end

    subgraph Server["Server-Side Actions"]
        BATCHAPI --> BPA["batchGetProviderEndpointProbeLogs"]
        BATCHAPI --> BVS["batchGetVendorTypeEndpointStats"]
        BATCHAPI --> BCI["batchGetEndpointCircuitInfo"]
    end

    subgraph Repository["Repository Layer"]
        BPA --> LATERAL["LATERAL JOIN batch query"]
        BVS --> GROUPBY["GROUP BY + FILTER aggregate"]
        BCI --> ALLHEALTH["getAllEndpointHealthStatusAsync"]
    end

    subgraph Cache["Caching Layer"]
        TTLMAP["TTLMap (key cache, totals, models)"]
        HEALTHCACHE["EndpointHealth LRU Cache"]
        PROVSTATS["Provider Statistics Cache + In-flight Dedup"]
        ALLHEALTH --> HEALTHCACHE
        HEALTHCACHE -->|TTL expired| REDIS["Redis Pipeline Batch Load"]
    end

    subgraph DB["Database"]
        LATERAL --> PG[(PostgreSQL)]
        GROUPBY --> PG
        REDIS -.->|circuit state| REDISDB[(Redis)]
        PG -.->|new indexes| IDX["11 new targeted indexes"]
    end

    subgraph Lifecycle["Provider Lifecycle"]
        DEL["deleteProvider / deleteProvidersBatch"]
        DEL -->|cascade| SOFTDEL["Soft-delete orphan endpoints"]
        SOFTDEL -->|NOT EXISTS check| PG
        UPD["updateProvider (enable)"]
        UPD -->|ensure endpoint| PG
    end
Loading

Last reviewed commit: 1cee0a9

@coderabbitai
Copy link

coderabbitai bot commented Feb 15, 2026

📝 Walkthrough

Walkthrough

引入端点探针的空闲 DB 轮询配置;新增批量端点/探针日志与供应商-类型统计 API;在 Redis 上实现批量电路断路器状态同步与 TTL/LRU 管理;重构探针调度以支持空闲轮询与下一次工作提示;添加配额费用聚合与轻量使用日志查询;扩展多处前端仪表盘与设置组件;新增多项索引与迁移校验工具。

Changes

Cohort / File(s) Summary
环境与元数据
\.env.example, biome.json, drizzle/meta/_journal.json, drizzle/0068_flaky_swarm.sql
添加 ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS 示例变量,更新 Biome schema,追加 journal 条目并新增索引迁移脚本。
迁移与校验脚本
scripts/validate-migrations.js, src/lib/migrate.ts
新增 Drizzle 日志单调性验证与 PostgreSQL advisory lock 辅助 (withAdvisoryLock),并增加迁移修复逻辑。
仓库:批量与端点查询
src/repository/provider-endpoints-batch.ts, src/repository/provider-endpoints.ts, src/repository/index.ts
新增批量统计与探针日志查询(per-endpoint batch、vendor-type stats);添加 findEnabledProviderEndpointsByVendorAndType、findProviderVendorsByIds、等批量/筛选函数;更新探针目标类型。
电路断路器与 Redis 状态批量加载
src/lib/endpoint-circuit-breaker.ts, src/lib/redis/endpoint-circuit-breaker-state.ts
实现 loadEndpointCircuitStates 批量 Redis 加载、getAllEndpointHealthStatusAsync、TTL/LRU 驱逐与 in-flight 去重,同步多实例状态并删除默认闭合键。
探针调度与探测逻辑
src/lib/provider-endpoints/probe-scheduler.ts, src/lib/provider-endpoints/probe.ts, src/lib/provider-endpoints/endpoint-selector.ts
引入空闲 DB 轮询间隔与 NEXT_DUE/NEXT_DB_POLL 提示,改为按 vendor:type 计数,集成批量健康检查并在探测成功时按配置有条件 reset circuit;始终写历史探测记录。
批处理与 leader lock
src/lib/provider-endpoints/leader-lock.ts, src/lib/provider-endpoints/probe-log-cleanup.ts
允许在限流禁用时仍获取 Redis 客户端(allowWhenRateLimitDisabled),添加 startLeaderLockKeepAlive 并在 cleanup 中使用以维持领导存活。
动作/API 层扩展
src/actions/provider-endpoints.ts, src/app/api/actions/[...route]/route.ts, src/app/api/availability/endpoints/route.ts
新增仪表板导向 API(getDashboardProviderVendors/Endpoints)、批量路由 batchGetProviderEndpointProbeLogs 与 batchGetVendorTypeEndpointStats,并注册 OpenAPI/权限。
前端仪表盘与设置重构
src/app/[locale]/dashboard/..., src/app/[locale]/settings/providers/_components/...
多处组件改为使用仪表板批量 API、引入请求生命周期控制(AbortController、requestId)、可见性/焦点节流、延迟加载 useInViewOnce 与批量/缓存优化;若干组件调整为数值 id。
使用日志与条件构建
src/repository/usage-logs.ts, src/repository/_shared/usage-log-filters.ts
新增轻量 findUsageLogsForKeySlim 接口及相关缓存;提取 buildUsageLogConditions 以统一条件构造;在若干查询中避免不必要的 join。
统计与配额聚合
src/repository/statistics.ts, src/actions/my-usage.ts
新增 QuotaCostRanges 与 QuotaCostSummary,添加 sumUserQuotaCosts 与 sumKeyQuotaCostsById(并移除旧的 sumKeyTotalCostById);引入 keyId->keyString TTLMap 缓存并调整调用方。
工具与批处理器
src/lib/provider-endpoints/probe-logs-batcher.ts, src/lib/hooks/use-in-view-once.ts, src/lib/cache/ttl-map.ts, src/lib/abort-utils.ts
新增 ProbeLogsBatcher(批量探针日志)、useInViewOnce hook、通用 TTLMap 缓存实现与 createAbortError 工具。
Schema 与 索引
src/drizzle/schema.ts, drizzle/0068_flaky_swarm.sql
追加多个用于热点查询的 Gin/复合/部分索引以优化 keys、message_request、provider_endpoints、providers、users 等表的查询。
启动/仪表(instrumentation)
src/instrumentation.ts
在生产初始化中使用 advisory lock 包装 provider backfill;改进 AUTO_MIGRATE 判定与日志。
测试
tests/...
扩展并调整大量单元/集成测试以覆盖批量 API、电路状态批量加载、TTLMap、useInViewOnce、ProbeLogsBatcher、新的配额聚合接口以及软删除语义变化。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 分钟

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (47 files):

⚔️ .env.example (content)
⚔️ CHANGELOG.md (content)
⚔️ README.en.md (content)
⚔️ README.md (content)
⚔️ VERSION (content)
⚔️ messages/en/dashboard.json (content)
⚔️ messages/en/myUsage.json (content)
⚔️ messages/en/provider-chain.json (content)
⚔️ messages/ja/dashboard.json (content)
⚔️ messages/ja/myUsage.json (content)
⚔️ messages/ja/provider-chain.json (content)
⚔️ messages/ru/dashboard.json (content)
⚔️ messages/ru/myUsage.json (content)
⚔️ messages/ru/provider-chain.json (content)
⚔️ messages/zh-CN/dashboard.json (content)
⚔️ messages/zh-CN/myUsage.json (content)
⚔️ messages/zh-CN/provider-chain.json (content)
⚔️ messages/zh-TW/dashboard.json (content)
⚔️ messages/zh-TW/myUsage.json (content)
⚔️ messages/zh-TW/provider-chain.json (content)
⚔️ package.json (content)
⚔️ src/app/[locale]/dashboard/logs/_components/error-details-dialog.test.tsx (content)
⚔️ src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx (content)
⚔️ src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/SummaryTab.tsx (content)
⚔️ src/app/[locale]/dashboard/logs/_components/provider-chain-popover.test.tsx (content)
⚔️ src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx (content)
⚔️ src/app/[locale]/my-usage/_components/collapsible-quota-card.tsx (content)
⚔️ src/app/[locale]/my-usage/_components/provider-group-info.tsx (content)
⚔️ src/app/[locale]/my-usage/_components/quota-cards.tsx (content)
⚔️ src/app/[locale]/my-usage/_components/statistics-summary-card.tsx (content)
⚔️ src/app/v1/_lib/proxy-handler.ts (content)
⚔️ src/app/v1/_lib/proxy/error-handler.ts (content)
⚔️ src/app/v1/_lib/proxy/errors.ts (content)
⚔️ src/app/v1/_lib/proxy/forwarder.ts (content)
⚔️ src/app/v1/_lib/proxy/response-handler.ts (content)
⚔️ src/app/v1/_lib/proxy/session.ts (content)
⚔️ src/instrumentation.ts (content)
⚔️ src/lib/config/env.schema.ts (content)
⚔️ src/lib/utils/cost-calculation.ts (content)
⚔️ src/lib/utils/provider-chain-formatter.test.ts (content)
⚔️ src/lib/utils/provider-chain-formatter.ts (content)
⚔️ src/lib/utils/upstream-error-detection.test.ts (content)
⚔️ src/lib/utils/upstream-error-detection.ts (content)
⚔️ src/types/message.ts (content)
⚔️ tests/unit/proxy/proxy-forwarder-endpoint-audit.test.ts (content)
⚔️ tests/unit/proxy/proxy-handler-session-id-error.test.ts (content)
⚔️ tests/unit/proxy/response-handler-endpoint-circuit-isolation.test.ts (content)

These conflicts must be resolved before merging into dev.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed PR标题"refactor(provider): improve provider page performance"清晰准确地概括了主要变更,即性能优化重点。
Description check ✅ Passed PR描述详细阐述了此PR的目标、主要变更、基础设施改进和代码质量改进,与PR变更内容高度相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/provider-performance
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch refactor/provider-performance
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the size/XL Extra Large PR (> 1000 lines) label Feb 15, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a comprehensive set of performance improvements and architectural refinements. It focuses on optimizing data access patterns through new database indexes and refactored queries, enhancing the responsiveness of various user interfaces, and strengthening the reliability of background processes like migrations and endpoint probing. These changes collectively aim to provide a faster, more stable, and scalable application experience.

Highlights

  • Performance Optimization: Introduced numerous database indexes across users, keys, providers, provider_endpoints, and message_request tables to significantly improve query performance for usage logs, statistics, and provider management.
  • Efficient Data Fetching: Refactored backend data retrieval for usage statistics and provider endpoints to use combined SQL queries with FILTER clauses and batching mechanisms, reducing database round trips and improving responsiveness, especially for dashboard and settings pages.
  • Frontend Responsiveness: Implemented client-side caching, request debouncing, abort controllers, and lazy loading (useInViewOnce hook) in React components to enhance the user experience on availability, logs, and provider settings pages.
  • Robust Endpoint Management: Improved the logic for syncing provider endpoints with providers, including checks for existing references, handling unique constraint conflicts, and ensuring proper cleanup (soft-deletion) of unreferenced endpoints. Also added batch fetching for endpoint health and probe logs.
  • Migration and Scheduler Enhancements: Added advisory locks for database migrations and provider backfill processes to prevent concurrent execution in multi-instance environments. The endpoint probe scheduler now optimizes DB polling intervals and uses provider type in its calculations.
Changelog
  • .env.example
    • Added ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS environment variable for probe scheduler idle DB polling interval.
  • biome.json
    • Updated Biome.js schema version to 2.3.15.
  • drizzle/0068_fuzzy_quasar.sql
    • Added idx_provider_endpoints_pick_enabled index on provider_endpoints for runtime endpoint selection.
    • Added idx_providers_vendor_type_url_active index on providers for checking old URL references.
  • drizzle/0069_broad_hellfire_club.sql
    • Added idx_message_request_key_created_at_id index on message_request for key-specific usage logs.
    • Added idx_message_request_model_active index on message_request for distinct model filtering.
    • Added idx_message_request_status_code_active index on message_request for distinct status code filtering.
  • drizzle/0070_warm_lilandra.sql
    • Added idx_message_request_created_at_id_active index on message_request for global usage logs keyset pagination.
    • Added idx_users_tags_gin GIN index on users table for tag filtering.
  • drizzle/0071_motionless_wind_dancer.sql
    • Added idx_keys_key index on keys table for key string lookups.
  • drizzle/0072_absurd_gwen_stacy.sql
    • Added idx_message_request_key_model_active index on message_request for key-specific distinct model filtering.
    • Added idx_message_request_key_endpoint_active index on message_request for key-specific distinct endpoint filtering.
  • drizzle/0073_minor_franklin_richards.sql
    • Added idx_providers_enabled_vendor_type index on providers for dashboard and probe scheduler hot paths.
  • drizzle/meta/_journal.json
    • Updated Drizzle migration journal with new migration entries.
  • scripts/validate-migrations.js
    • Added validateJournalMonotonicity function to check Drizzle journal timestamp monotonicity.
    • Integrated journal monotonicity validation into the main migration validation process.
  • src/actions/my-usage.ts
    • Removed deprecated sumUserCost function.
    • Refactored getMyQuota to use new combined sumKeyQuotaCostsById and sumUserQuotaCosts functions for efficiency.
    • Optimized getMyTodayStats to aggregate totals from model breakdown instead of a separate query.
    • Added sessionId to MyUsageLogsFilters interface and updated getMyUsageLogs to use findUsageLogsForKeySlim.
    • Changed getUserConcurrentSessions to directly use SessionTracker.getUserSessionCount.
    • Updated ModelBreakdownItem and MyStatsSummary interfaces to include new cache token fields.
    • Refactored getMyStatsSummary to use a single aggregated SQL query for both key and user model breakdowns.
  • src/actions/provider-endpoints.ts
    • Added new Zod schemas for batch operations (BatchGetVendorTypeEndpointStatsSchema, BatchGetProviderEndpointProbeLogsBatchSchema).
    • Implemented isForeignKeyViolationError helper for error handling.
    • Added getDashboardProviderVendors and getDashboardProviderEndpoints for dashboard-specific data retrieval.
    • Enhanced addProviderEndpoint to handle foreign key violations and publish cache invalidation.
    • Modified editProviderEndpoint to check for existing endpoint, reset circuit breaker on URL/enabled status change, and publish cache invalidation.
    • Added a check in removeProviderEndpoint to prevent deletion if the endpoint is still referenced by an enabled provider, and added circuit breaker reset and cache invalidation.
    • Updated probeProviderEndpoint to use probeProviderEndpointAndRecordByEndpoint.
    • Added batchGetProviderEndpointProbeLogs and batchGetVendorTypeEndpointStats actions for efficient data fetching.
    • Updated batchGetEndpointCircuitInfo to use getAllEndpointHealthStatusAsync for batch fetching.
  • src/actions/providers.ts
    • Replaced findProviderVendorById with findProviderVendorsByIds for batch fetching.
    • Updated reclusterProviderVendors to use findProviderVendorsByIds.
  • src/app/[locale]/dashboard/availability/_components/availability-dashboard.tsx
    • Implemented request debouncing, abort controller, and visibility-based refresh logic for fetchData.
    • Memoized overviewMetrics calculation for performance.
  • src/app/[locale]/dashboard/availability/_components/endpoint-probe-history.tsx
    • Changed state types for selected IDs to number | null for consistency.
    • Updated data fetching to use getDashboardProviderVendors and getDashboardProviderEndpoints.
    • Modified select components to reset endpoint selection when vendor or type changes.
  • src/app/[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx
    • Refactored data fetching into refreshVendors, refreshEndpoints, refreshProbeLogs with request debouncing and abort controllers.
    • Implemented focus/visibility change refresh logic to avoid stale data.
    • Modified handleProbe to use new refresh functions and handle concurrent selection changes.
    • Derived provider types from the selected vendor instead of a hardcoded list.
  • src/app/[locale]/dashboard/availability/_components/endpoint/probe-grid.tsx
    • Added safeHostnameFromUrl utility function for cleaner URL display.
    • Modified endpoint display to prioritize label, then hostname, then URL.
    • Added validation for Date objects in formatTime.
  • src/app/[locale]/dashboard/logs/_components/usage-logs-view-virtualized.tsx
    • Removed local QueryClient and QueryClientProvider usage, assuming a global setup.
    • Defined EMPTY_PROVIDERS and EMPTY_KEYS constants for better type safety and default values.
    • Refactored filters memoization to correctly parse URL parameters.
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
    • Memoized allLogs to prevent re-computation on re-renders.
    • Introduced shouldPoll to control refetchInterval based on auto-refresh and scroll position.
  • src/app/[locale]/dashboard/users/users-page-client.tsx
    • Removed local QueryClient and QueryClientProvider usage.
    • Memoized pendingTagFiltersKey and pendingKeyGroupFiltersKey for stable debounce keys.
  • src/app/[locale]/settings/providers/_components/add-provider-dialog.tsx
    • Removed useQueryClient and useRouter imports, relying on granular cache invalidation within the form.
  • src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
    • Added useInViewOnce hook for lazy loading sparklines when in view.
    • Implemented batch fetching logic for probe logs using ProbeLogsBatcher to reduce network requests.
    • Refined avgLatency calculation and added skeleton loading state for better UX.
  • src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx
    • Added staleTime and refetchOnWindowFocus: false to useQuery calls for improved caching behavior.
    • Updated cache invalidation logic on form success to be more specific and include providers-statistics.
  • src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx
    • Implemented VendorTypeEndpointStatsBatcher for batch fetching endpoint statistics.
    • Updated useQuery for stats to use the batcher.
    • Updated useQuery for allEndpoints to be enabled only when the tooltip is open or in test environment.
    • Implemented batch fetching for endpoint-circuit-info using batchGetEndpointCircuitInfo.
    • Modified EndpointRow to receive circuitState directly.
  • src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx
    • Added useInViewOnce hook for lazy loading table sections.
    • Added staleTime and refetchOnWindowFocus: false to useQuery for provider-endpoints.
    • Implemented batch fetching for endpoint-circuit-info with chunking.
    • Updated cache invalidation for mutations to be more specific to the vendor.
    • Added deferUntilInView prop to ProviderEndpointsSection for lazy loading.
  • src/app/[locale]/settings/providers/_components/provider-list.tsx
    • Added useQuery for provider-vendors with caching settings.
    • Memoized vendorById map for efficient lookups.
    • Passed vendor prop to ProviderRichListItem.
  • src/app/[locale]/settings/providers/_components/provider-rich-list-item.tsx
    • Removed useRouter and useQuery for provider-vendors.
    • Added vendor prop to receive vendor data directly.
    • Removed router.refresh() calls and broad queryClient.invalidateQueries on form success, relying on more granular invalidation.
  • src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx
    • Added staleTime and refetchOnWindowFocus: false to useQuery for provider-vendors.
    • Memoized vendorById map.
    • Updated VendorCard to pass deferUntilInView to ProviderEndpointsSection for lazy loading.
  • src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx
    • Removed useRouter import.
    • Updated urlResolver to use queryClient.fetchQuery for provider-endpoints with caching.
    • Removed queryClient.invalidateQueries calls on form success, relying on specific invalidations.
    • Updated cache invalidation for toggleEnabledMutation and deleteMutation to be more specific.
  • src/app/api/availability/endpoints/route.ts
    • Replaced findProviderEndpointsByVendorAndType with findDashboardProviderEndpointsByVendorAndType for dashboard data.
  • src/drizzle/schema.ts
    • Added multiple new database indexes to optimize various queries related to users, keys, providers, provider endpoints, and message requests.
  • src/instrumentation.ts
    • Imported withAdvisoryLock from lib/migrate.
    • Modified AUTO_MIGRATE environment variable parsing for robustness.
    • Wrapped provider backfill logic with withAdvisoryLock to prevent concurrent execution across instances.
  • src/lib/endpoint-circuit-breaker.ts
    • Implemented LRU caching for endpoint health states to manage memory usage.
    • Refactored getOrCreateHealth to debounce Redis reads and use TTL for cache freshness.
    • Modified persistStateToRedis to delete default closed states from Redis to reduce storage overhead.
    • Added syncHealthFromRedisBatch and getAllEndpointHealthStatusAsync for efficient batch retrieval of endpoint health statuses.
  • src/lib/provider-endpoints/endpoint-selector.ts
    • Updated endpoint selection logic to use getAllEndpointHealthStatusAsync for batch circuit state checks.
    • Changed findProviderEndpointsByVendorAndType to findEnabledProviderEndpointsByVendorAndType for getPreferredProviderEndpoints to only consider enabled endpoints.
  • src/lib/provider-endpoints/leader-lock.ts
    • Modified getRedisClient calls to allow client instantiation even when rate limiting is disabled, ensuring leader election functions independently.
  • src/lib/provider-endpoints/probe-log-cleanup.ts
    • Implemented startLeaderLockKeepAlive to periodically renew the leader lock, ensuring continuous cleanup operation.
    • Added leadershipLost flag to gracefully stop cleanup if the lock is lost.
  • src/lib/provider-endpoints/probe-scheduler.ts
    • Added ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS environment variable for configurable idle DB polling.
    • Changed endpoint counting to countEndpointsByVendorType to include provider type for more granular scheduling.
    • Implemented computeNextDueAtMs, clearNextWorkHints, updateNextWorkHints to optimize scheduler wake-up times and avoid unnecessary DB polls.
    • Updated findEnabledProviderEndpointsForProbing to filter by provider type.
    • Modified probeProviderEndpointAndRecordByEndpoint to update endpoint probe status directly on the object.
  • src/lib/provider-endpoints/probe.ts
    • Modified probeProviderEndpointAndRecordByEndpoint to check ENABLE_ENDPOINT_CIRCUIT_BREAKER before resetting the circuit and added logging for reset failures.
  • src/lib/redis/client.ts
    • Added allowWhenRateLimitDisabled option to getRedisClient to allow Redis client instantiation even if rate limiting is disabled.
  • src/lib/redis/endpoint-circuit-breaker-state.ts
    • Added loadEndpointCircuitStates for batch loading of circuit breaker states from Redis using pipelines.
    • Modified saveEndpointCircuitState to use a Redis pipeline for hset and expire commands for atomicity.
  • src/repository/index.ts
    • Exported new provider-endpoint related functions for broader access.
  • src/repository/leaderboard.ts
    • Refactored buildDateCondition to use DATE_TRUNC and INTERVAL for more accurate date range filtering across timezones, improving DST compatibility.
  • src/repository/overview.ts
    • Refactored getOverviewMetrics and getOverviewMetricsWithComparison to use DATE_TRUNC and INTERVAL for more accurate daily/yesterday date range filtering across timezones.
  • src/repository/provider-endpoints-batch.ts
    • Added new file provider-endpoints-batch.ts for batch fetching endpoint statistics and probe logs using optimized SQL queries.
  • src/repository/provider-endpoints.ts
    • Updated ProviderEndpointProbeTarget to include providerType.
    • Refactored findEnabledProviderEndpointsForProbing to use a CTE to only select endpoints associated with enabled providers.
    • Enhanced getOrCreateProviderVendorIdFromUrls to update vendor metadata if null/empty.
    • Added findProviderVendorsByIds, findEnabledProviderVendorTypePairs, hasEnabledProviderReferenceForVendorTypeUrl, and findDashboardProviderEndpointsByVendorAndType for improved data access.
    • Added findEnabledProviderEndpointsByVendorAndType for runtime hot path.
    • Modified syncProviderEndpointOnProviderEdit to prioritize active rows and handle unique constraint conflicts during endpoint revive.
    • Updated recordProviderEndpointProbeResult to check if the endpoint still exists before inserting a probe log.
    • Modified findProviderEndpointProbeLogs to order by createdAt DESC NULLS LAST, id DESC for stable pagination.
  • src/repository/provider.ts
    • Updated updateProvider to trigger endpoint sync if is_enabled changes to true and ensure endpoint pool is updated even if provider is disabled but URL/type changes.
    • Refactored deleteProvider to soft-delete the corresponding endpoint if it's no longer referenced by any enabled provider.
    • Refactored updateProvidersBatch to ensure provider endpoints are created/reactivated when providers are enabled in batch.
    • Refactored deleteProvidersBatch to soft-delete corresponding endpoints if they are no longer referenced by any enabled provider.
    • Optimized getProviderStatistics SQL query using CTEs and added in-memory caching with TTL and in-flight deduplication.
  • src/repository/statistics.ts
    • Implemented getKeyStringByIdCached with LRU caching for keyId to keyString lookups.
    • Updated various sum and find functions to use getKeyStringByIdCached.
    • Added sumUserQuotaCosts and sumKeyQuotaCostsById for combined, efficient fetching of quota costs across multiple time ranges using SQL FILTER clauses.
    • Updated getUserStatisticsFromDB, getKeyStatisticsFromDB, getMixedStatisticsFromDB to add created_at range conditions to LEFT JOIN clauses for better index utilization.
  • src/repository/usage-logs.ts
    • Updated findUsageLogsBatch to use gte and lt for createdAt conditions.
    • Added findUsageLogsForKeySlim for optimized fetching of usage logs for a specific key, including caching for total counts.
    • Implemented caching for getDistinctModelsForKey and getDistinctEndpointsForKey with LRU eviction.
    • Updated getDistinctModelsForKey and getDistinctEndpointsForKey to include EXCLUDE_WARMUP_CONDITION.
    • Refactored findUsageLogsWithDetails to use safePage and safePageSize, and conditionally INNER JOIN keysTable only when keyId is present.
    • Optimized findUsageLogsWithDetails to use gte and lt for createdAt conditions and LEFT JOIN for providers table.
    • Modified findUsageLogsWithDetails to use desc(messageRequest.id) for orderBy to ensure stable keyset pagination.
  • tests/integration/provider-endpoint-sync-race.test.ts
    • Updated assertions to reflect the new behavior of soft-deleting previous endpoints when unreferenced.
  • tests/unit/actions/my-usage-concurrent-inherit.test.ts
    • Updated mocks to include sumUserQuotaCosts and sumKeyQuotaCostsById.
  • tests/unit/actions/my-usage-date-range-dst.test.ts
    • Updated mocks to use findUsageLogsForKeySlim instead of findUsageLogsWithDetails.
  • tests/unit/actions/my-usage-token-aggregation.test.ts
    • Updated tests to reflect changes in token aggregation logic, now using a single query for getMyTodayStats and new token fields in getMyStatsSummary.
  • tests/unit/actions/provider-endpoints.test.ts
    • Updated mocks for provider-endpoints actions to include new batch fetching and reference checking functions.
    • Added tests for getDashboardProviderEndpoints and probeProviderEndpoint.
    • Updated batchGetEndpointCircuitInfo test to use getAllEndpointHealthStatusAsync.
  • tests/unit/actions/providers-recluster.test.ts
    • Updated mocks to use findProviderVendorsByIds.
  • tests/unit/actions/total-usage-semantics.test.ts
    • Updated tests to verify that sumKeyQuotaCostsById and sumUserQuotaCosts are called with ALL_TIME_MAX_AGE_DAYS for total cost calculations.
  • tests/unit/lib/endpoint-circuit-breaker.test.ts
    • Added tests for getAllEndpointHealthStatusAsync to verify Redis synchronization, in-flight request deduplication, and LRU cache eviction.
  • tests/unit/lib/provider-endpoints/endpoint-selector.test.ts
    • Updated tests to use getAllEndpointHealthStatusAsync for batch circuit state checks and findEnabledProviderEndpointsByVendorAndType for fetching enabled endpoints.
  • tests/unit/lib/use-in-view-once.test.tsx
    • Added new test file for useInViewOnce hook, verifying its behavior in test environments and shared observer logic.
  • tests/unit/repository/provider-endpoint-sync-helper.test.ts
    • Updated mocks for select to include orderBy for consistency.
    • Updated tests to reflect the new behavior of soft-deleting previous endpoints when unreferenced.
  • tests/unit/repository/provider-endpoints-probe-result.test.ts
    • Added new test file for recordProviderEndpointProbeResult, verifying silent ignore for non-existent endpoints and correct update/insert behavior.
  • tests/unit/repository/statistics-quota-costs-all-time.test.ts
    • Added new test file for sumUserQuotaCosts and sumKeyQuotaCostsById, verifying that Infinity for maxAgeDays results in no created_at cutoff in the SQL query.
Activity
  • The pull request introduces a new environment variable ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS in .env.example.
  • Several new Drizzle migration files (0068 to 0073) were added, indicating significant database schema changes, primarily new indexes.
  • The drizzle/meta/_journal.json file was updated to reflect these new migrations.
  • The scripts/validate-migrations.js file was modified to include a new validation for Drizzle journal monotonicity.
  • The src/actions/my-usage.ts file saw substantial changes, including removal of deprecated functions, refactoring of quota calculations, and updates to usage log fetching.
  • The src/actions/provider-endpoints.ts file was heavily modified to introduce batch operations, new dashboard-specific data fetching, and refined endpoint management logic.
  • Frontend components in src/app/[locale]/dashboard/availability and src/app/[locale]/settings/providers were updated to leverage new backend APIs and implement performance optimizations like lazy loading and request debouncing.
  • New utility files like src/lib/hooks/use-in-view-once.ts and src/repository/provider-endpoints-batch.ts were added to support these frontend and backend optimizations.
  • Core logic in src/lib/endpoint-circuit-breaker.ts, src/lib/provider-endpoints/endpoint-selector.ts, src/lib/provider-endpoints/probe-scheduler.ts, and src/lib/redis/endpoint-circuit-breaker-state.ts was updated for batching, caching, and improved concurrency handling.
  • Repository functions in src/repository/provider-endpoints.ts, src/repository/provider.ts, src/repository/statistics.ts, and src/repository/usage-logs.ts were extensively refactored for performance, accuracy, and new features.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that introduces significant performance and robustness improvements across the application, particularly on the provider and usage-related pages. The changes are well-thought-out and demonstrate a deep understanding of both backend and frontend optimization techniques.

Key improvements include:

  • Database Performance: New indexes and optimized queries (using FILTER, LATERAL joins, and reducing N+1 problems) will greatly improve database efficiency.
  • API Batching: Many frontend components now batch requests for data like endpoint stats and circuit breaker information, which will reduce network overhead and improve page load times.
  • UI Responsiveness: The use of lazy loading (useInViewOnce), request cancellation (AbortController), and other hooks enhances the user experience by making the UI faster and more robust.
  • System Robustness: The addition of advisory locks for startup tasks and more granular cache invalidation strategies makes the system more reliable, especially in multi-instance deployments.

I have reviewed the code changes in detail and found them to be of very high quality. I did not identify any issues or bugs. This is a fantastic contribution to the project's performance and stability.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

72 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 1135 to 1224
const promise = (async () => {
// 使用 providerChain 最后一项的 providerId 来确定最终供应商(兼容重试切换)
// 如果 provider_chain 为空(无重试),则使用 provider_id 字段
const query = sql`
WITH bounds AS (
SELECT
(DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start,
((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) + INTERVAL '1 day') AT TIME ZONE ${timezone}) AS tomorrow_start,
((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) - INTERVAL '7 days') AT TIME ZONE ${timezone}) AS last7_start
),
provider_stats AS (
-- 先按最终供应商聚合,再与 providers 做 LEFT JOIN,避免 providers × 今日请求 的笛卡尔积
SELECT
mr.final_provider_id,
COALESCE(SUM(mr.cost_usd), 0) AS today_cost,
COUNT(*)::integer AS today_calls
FROM (
SELECT
CASE
WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
ELSE provider_id
END AS final_provider_id,
cost_usd
FROM message_request
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (SELECT today_start FROM bounds)
AND created_at < (SELECT tomorrow_start FROM bounds)
) mr
GROUP BY mr.final_provider_id
),
latest_call AS (
SELECT DISTINCT ON (final_provider_id)
final_provider_id,
created_at AS last_call_time,
model AS last_call_model
FROM (
SELECT
CASE
WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
ELSE provider_id
END AS final_provider_id,
id,
created_at,
model
FROM message_request
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (SELECT last7_start FROM bounds)
) mr
-- 性能优化:添加 7 天时间范围限制(避免扫描历史数据)
ORDER BY final_provider_id, created_at DESC, id DESC
)
SELECT
p.id,
COALESCE(
SUM(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
-- 情况1:无重试(provider_chain 为 NULL 或空数组),使用 provider_id
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
-- 情况2:有重试,使用 providerChain 最后一项的 id
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN mr.cost_usd ELSE 0 END),
0
) AS today_cost,
COUNT(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN 1 END)::integer AS today_calls
COALESCE(ps.today_cost, 0) AS today_cost,
COALESCE(ps.today_calls, 0) AS today_calls,
lc.last_call_time,
lc.last_call_model
FROM providers p
-- 性能优化:添加日期过滤条件,仅扫描今日数据(避免全表扫描)
LEFT JOIN message_request mr ON mr.deleted_at IS NULL
AND (mr.blocked_by IS NULL OR mr.blocked_by <> 'warmup')
AND mr.created_at >= (CURRENT_DATE AT TIME ZONE ${timezone})
LEFT JOIN provider_stats ps ON p.id = ps.final_provider_id
LEFT JOIN latest_call lc ON p.id = lc.final_provider_id
WHERE p.deleted_at IS NULL
GROUP BY p.id
),
latest_call AS (
SELECT DISTINCT ON (final_provider_id)
-- 计算最终供应商ID:优先使用 providerChain 最后一项的 id
CASE
WHEN provider_chain IS NULL OR jsonb_array_length(provider_chain) = 0 THEN provider_id
ELSE (provider_chain->-1->>'id')::int
END AS final_provider_id,
created_at AS last_call_time,
model AS last_call_model
FROM message_request
-- 性能优化:添加 7 天时间范围限制(避免扫描历史数据)
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (CURRENT_DATE AT TIME ZONE ${timezone} - INTERVAL '7 days')
ORDER BY final_provider_id, created_at DESC
)
SELECT
ps.id,
ps.today_cost,
ps.today_calls,
lc.last_call_time,
lc.last_call_model
FROM provider_stats ps
LEFT JOIN latest_call lc ON ps.id = lc.final_provider_id
ORDER BY ps.id ASC
`;

logger.trace("getProviderStatistics:executing_query");

const result = await db.execute(query);

logger.trace("getProviderStatistics:result", {
count: Array.isArray(result) ? result.length : 0,
});
ORDER BY p.id ASC
`;

logger.trace("getProviderStatistics:executing_query");

const result = await db.execute(query);
const data = Array.from(result) as ProviderStatisticsRow[];

logger.trace("getProviderStatistics:result", {
count: data.length,
});

// 注意:返回结果中的 today_cost 为 numeric,使用字符串表示;
// last_call_time 由数据库返回为时间戳(UTC)。
// 这里保持原样,交由上层进行展示格式化。
return result as unknown as Array<{
id: number;
today_cost: string;
today_calls: number;
last_call_time: Date | null;
last_call_model: string | null;
}>;
// 注意:返回结果中的 today_cost 为 numeric,使用字符串表示;
// last_call_time 由数据库返回为时间戳(UTC)。
// 这里保持原样,交由上层进行展示格式化。
providerStatisticsCache = {
timezone,
expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS,
data,
};

return data;
})();

providerStatisticsInFlight = { timezone, promise };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-flight dedup race condition

The providerStatisticsInFlight is set at line 1224, after the promise is already created and started executing at line 1135. Between promise creation and the assignment, any concurrent caller will pass the in-flight check at line 1131 and start a duplicate query, defeating the dedup.

Move the assignment before await:

Suggested change
const promise = (async () => {
// 使用 providerChain 最后一项的 providerId 来确定最终供应商(兼容重试切换)
// 如果 provider_chain 为空(无重试),则使用 provider_id 字段
const query = sql`
WITH bounds AS (
SELECT
(DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start,
((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) + INTERVAL '1 day') AT TIME ZONE ${timezone}) AS tomorrow_start,
((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) - INTERVAL '7 days') AT TIME ZONE ${timezone}) AS last7_start
),
provider_stats AS (
-- 先按最终供应商聚合,再与 providers LEFT JOIN,避免 providers × 今日请求 的笛卡尔积
SELECT
mr.final_provider_id,
COALESCE(SUM(mr.cost_usd), 0) AS today_cost,
COUNT(*)::integer AS today_calls
FROM (
SELECT
CASE
WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
ELSE provider_id
END AS final_provider_id,
cost_usd
FROM message_request
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (SELECT today_start FROM bounds)
AND created_at < (SELECT tomorrow_start FROM bounds)
) mr
GROUP BY mr.final_provider_id
),
latest_call AS (
SELECT DISTINCT ON (final_provider_id)
final_provider_id,
created_at AS last_call_time,
model AS last_call_model
FROM (
SELECT
CASE
WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
ELSE provider_id
END AS final_provider_id,
id,
created_at,
model
FROM message_request
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (SELECT last7_start FROM bounds)
) mr
-- 性能优化:添加 7 天时间范围限制(避免扫描历史数据)
ORDER BY final_provider_id, created_at DESC, id DESC
)
SELECT
p.id,
COALESCE(
SUM(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
-- 情况1:无重试(provider_chain NULL 或空数组),使用 provider_id
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
-- 情况2:有重试,使用 providerChain 最后一项的 id
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN mr.cost_usd ELSE 0 END),
0
) AS today_cost,
COUNT(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN 1 END)::integer AS today_calls
COALESCE(ps.today_cost, 0) AS today_cost,
COALESCE(ps.today_calls, 0) AS today_calls,
lc.last_call_time,
lc.last_call_model
FROM providers p
-- 性能优化:添加日期过滤条件,仅扫描今日数据(避免全表扫描)
LEFT JOIN message_request mr ON mr.deleted_at IS NULL
AND (mr.blocked_by IS NULL OR mr.blocked_by <> 'warmup')
AND mr.created_at >= (CURRENT_DATE AT TIME ZONE ${timezone})
LEFT JOIN provider_stats ps ON p.id = ps.final_provider_id
LEFT JOIN latest_call lc ON p.id = lc.final_provider_id
WHERE p.deleted_at IS NULL
GROUP BY p.id
),
latest_call AS (
SELECT DISTINCT ON (final_provider_id)
-- 计算最终供应商ID:优先使用 providerChain 最后一项的 id
CASE
WHEN provider_chain IS NULL OR jsonb_array_length(provider_chain) = 0 THEN provider_id
ELSE (provider_chain->-1->>'id')::int
END AS final_provider_id,
created_at AS last_call_time,
model AS last_call_model
FROM message_request
-- 性能优化:添加 7 天时间范围限制(避免扫描历史数据)
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (CURRENT_DATE AT TIME ZONE ${timezone} - INTERVAL '7 days')
ORDER BY final_provider_id, created_at DESC
)
SELECT
ps.id,
ps.today_cost,
ps.today_calls,
lc.last_call_time,
lc.last_call_model
FROM provider_stats ps
LEFT JOIN latest_call lc ON ps.id = lc.final_provider_id
ORDER BY ps.id ASC
`;
logger.trace("getProviderStatistics:executing_query");
const result = await db.execute(query);
logger.trace("getProviderStatistics:result", {
count: Array.isArray(result) ? result.length : 0,
});
ORDER BY p.id ASC
`;
logger.trace("getProviderStatistics:executing_query");
const result = await db.execute(query);
const data = Array.from(result) as ProviderStatisticsRow[];
logger.trace("getProviderStatistics:result", {
count: data.length,
});
// 注意:返回结果中的 today_cost 为 numeric,使用字符串表示;
// last_call_time 由数据库返回为时间戳(UTC)。
// 这里保持原样,交由上层进行展示格式化。
return result as unknown as Array<{
id: number;
today_cost: string;
today_calls: number;
last_call_time: Date | null;
last_call_model: string | null;
}>;
// 注意:返回结果中的 today_cost 为 numeric,使用字符串表示;
// last_call_time 由数据库返回为时间戳(UTC)。
// 这里保持原样,交由上层进行展示格式化。
providerStatisticsCache = {
timezone,
expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS,
data,
};
return data;
})();
providerStatisticsInFlight = { timezone, promise };
const promise = (async () => {
// 使用 providerChain 最后一项的 providerId 来确定最终供应商(兼容重试切换)
// 如果 provider_chain 为空(无重试),则使用 provider_id 字段
const query = sql`
WITH bounds AS (
SELECT
(DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start,
((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) + INTERVAL '1 day') AT TIME ZONE ${timezone}) AS tomorrow_start,
((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) - INTERVAL '7 days') AT TIME ZONE ${timezone}) AS last7_start
),
provider_stats AS (
-- 先按最终供应商聚合,再与 providers LEFT JOIN,避免 providers × 今日请求 的笛卡尔积
SELECT
mr.final_provider_id,
COALESCE(SUM(mr.cost_usd), 0) AS today_cost,
COUNT(*)::integer AS today_calls
FROM (
SELECT
CASE
WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
ELSE provider_id
END AS final_provider_id,
cost_usd
FROM message_request
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (SELECT today_start FROM bounds)
AND created_at < (SELECT tomorrow_start FROM bounds)
) mr
GROUP BY mr.final_provider_id
),
latest_call AS (
SELECT DISTINCT ON (final_provider_id)
final_provider_id,
created_at AS last_call_time,
model AS last_call_model
FROM (
SELECT
CASE
WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
ELSE provider_id
END AS final_provider_id,
id,
created_at,
model
FROM message_request
WHERE deleted_at IS NULL
AND (blocked_by IS NULL OR blocked_by <> 'warmup')
AND created_at >= (SELECT last7_start FROM bounds)
) mr
-- 性能优化:添加 7 天时间范围限制(避免扫描历史数据)
ORDER BY final_provider_id, created_at DESC, id DESC
)
SELECT
p.id,
COALESCE(ps.today_cost, 0) AS today_cost,
COALESCE(ps.today_calls, 0) AS today_calls,
lc.last_call_time,
lc.last_call_model
FROM providers p
LEFT JOIN provider_stats ps ON p.id = ps.final_provider_id
LEFT JOIN latest_call lc ON p.id = lc.final_provider_id
WHERE p.deleted_at IS NULL
ORDER BY p.id ASC
`;
logger.trace("getProviderStatistics:executing_query");
const result = await db.execute(query);
const data = Array.from(result) as ProviderStatisticsRow[];
logger.trace("getProviderStatistics:result", {
count: data.length,
});
// 注意:返回结果中的 today_cost 为 numeric,使用字符串表示;
// last_call_time 由数据库返回为时间戳(UTC)。
// 这里保持原样,交由上层进行展示格式化。
providerStatisticsCache = {
timezone,
expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS,
data,
};
return data;
})();
// Set in-flight BEFORE awaiting to prevent concurrent callers from starting duplicate queries
providerStatisticsInFlight = { timezone, promise };
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1135:1224

Comment:
**In-flight dedup race condition**

The `providerStatisticsInFlight` is set at line 1224, **after** the promise is already created and started executing at line 1135. Between promise creation and the assignment, any concurrent caller will pass the in-flight check at line 1131 and start a duplicate query, defeating the dedup.

Move the assignment before `await`:

```suggestion
    const promise = (async () => {
      // 使用 providerChain 最后一项的 providerId 来确定最终供应商(兼容重试切换)
      // 如果 provider_chain 为空(无重试),则使用 provider_id 字段
      const query = sql`
         WITH bounds AS (
           SELECT
             (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start,
             ((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) + INTERVAL '1 day') AT TIME ZONE ${timezone}) AS tomorrow_start,
             ((DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) - INTERVAL '7 days') AT TIME ZONE ${timezone}) AS last7_start
         ),
         provider_stats AS (
           -- 先按最终供应商聚合,再与 providers 做 LEFT JOIN,避免 providers × 今日请求 的笛卡尔积
           SELECT
            mr.final_provider_id,
            COALESCE(SUM(mr.cost_usd), 0) AS today_cost,
            COUNT(*)::integer AS today_calls
          FROM (
            SELECT
              CASE
                WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
                WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
                ELSE provider_id
              END AS final_provider_id,
              cost_usd
            FROM message_request
            WHERE deleted_at IS NULL
              AND (blocked_by IS NULL OR blocked_by <> 'warmup')
              AND created_at >= (SELECT today_start FROM bounds)
              AND created_at < (SELECT tomorrow_start FROM bounds)
          ) mr
          GROUP BY mr.final_provider_id
        ),
        latest_call AS (
          SELECT DISTINCT ON (final_provider_id)
            final_provider_id,
            created_at AS last_call_time,
            model AS last_call_model
          FROM (
            SELECT
              CASE
                WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id
                WHEN (provider_chain->-1->>'id') ~ '^[0-9]+$' THEN (provider_chain->-1->>'id')::int
                ELSE provider_id
              END AS final_provider_id,
              id,
              created_at,
              model
            FROM message_request
            WHERE deleted_at IS NULL
              AND (blocked_by IS NULL OR blocked_by <> 'warmup')
              AND created_at >= (SELECT last7_start FROM bounds)
          ) mr
          -- 性能优化:添加 7 天时间范围限制(避免扫描历史数据)
          ORDER BY final_provider_id, created_at DESC, id DESC
        )
        SELECT
          p.id,
          COALESCE(ps.today_cost, 0) AS today_cost,
          COALESCE(ps.today_calls, 0) AS today_calls,
          lc.last_call_time,
          lc.last_call_model
        FROM providers p
        LEFT JOIN provider_stats ps ON p.id = ps.final_provider_id
        LEFT JOIN latest_call lc ON p.id = lc.final_provider_id
        WHERE p.deleted_at IS NULL
        ORDER BY p.id ASC
      `;

      logger.trace("getProviderStatistics:executing_query");

      const result = await db.execute(query);
      const data = Array.from(result) as ProviderStatisticsRow[];

      logger.trace("getProviderStatistics:result", {
        count: data.length,
      });

      // 注意:返回结果中的 today_cost 为 numeric,使用字符串表示;
      // last_call_time 由数据库返回为时间戳(UTC)。
      // 这里保持原样,交由上层进行展示格式化。
      providerStatisticsCache = {
        timezone,
        expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS,
        data,
      };

      return data;
    })();

    // Set in-flight BEFORE awaiting to prevent concurrent callers from starting duplicate queries
    providerStatisticsInFlight = { timezone, promise };
```

How can I resolve this? If you propose a fix, please make it concise.

if (endpoints.length <= 1) return endpoints;

return enabled.slice().sort((a, b) => {
endpoints.sort((a, b) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-place sort mutates parameter

endpoints.sort(...) mutates the input array. This is currently safe because rankProviderEndpoints always creates a fresh array via .filter() first. But rankActiveProviderEndpoints is a footgun if called directly with a shared reference in the future. Consider using endpoints.slice().sort(...) or Array.from(endpoints).sort(...) to avoid mutating the caller's array.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/provider-endpoints/endpoint-selector.ts
Line: 20:20

Comment:
**In-place sort mutates parameter**

`endpoints.sort(...)` mutates the input array. This is currently safe because `rankProviderEndpoints` always creates a fresh array via `.filter()` first. But `rankActiveProviderEndpoints` is a footgun if called directly with a shared reference in the future. Consider using `endpoints.slice().sort(...)` or `Array.from(endpoints).sort(...)` to avoid mutating the caller's array.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Feb 15, 2026

Additional Comments (1)

src/lib/endpoint-circuit-breaker.ts
rankActiveProviderEndpoints mutates input array

rankActiveProviderEndpoints sorts the endpoints array in place (line 20 of endpoint-selector.ts). Its caller rankProviderEndpoints creates a new array via .filter(), so that's fine. However, the hot-path getPreferredProviderEndpoints passes circuitCandidates (which may be the same reference as the endpoints array returned from the DB query when excludeSet is null) through rankProviderEndpoints -> rankActiveProviderEndpoints. Since rankProviderEndpoints always calls .filter() first, this is currently safe, but the in-place sort inside rankActiveProviderEndpoints is fragile and could silently corrupt the caller's data if refactored.

This is a minor observation - the redundant filter in rankProviderEndpoints when called from getPreferredProviderEndpoints (where endpoints are already known to be enabled) actually serves as a protective copy. Worth being aware of for future refactoring.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 17:17

Comment:
**`rankActiveProviderEndpoints` mutates input array**

`rankActiveProviderEndpoints` sorts the `endpoints` array in place (line 20 of `endpoint-selector.ts`). Its caller `rankProviderEndpoints` creates a new array via `.filter()`, so that's fine. However, the hot-path `getPreferredProviderEndpoints` passes `circuitCandidates` (which may be the *same* reference as the `endpoints` array returned from the DB query when `excludeSet` is null) through `rankProviderEndpoints` -> `rankActiveProviderEndpoints`. Since `rankProviderEndpoints` always calls `.filter()` first, this is currently safe, but the in-place sort inside `rankActiveProviderEndpoints` is fragile and could silently corrupt the caller's data if refactored.

This is a minor observation - the redundant `filter` in `rankProviderEndpoints` when called from `getPreferredProviderEndpoints` (where endpoints are already known to be enabled) actually serves as a protective copy. Worth being aware of for future refactoring.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@src/actions/provider-endpoints.ts`:
- Around line 416-430: The user-facing error strings in provider-endpoints
handlers (e.g., the branch using isDirectEndpointEditConflictError and
isForeignKeyViolationError) are hardcoded Chinese; replace them with i18n keys
or remove the message and return only errorCode so the frontend handles
localization. Update the return objects in functions/methods that use
isDirectEndpointEditConflictError, isForeignKeyViolationError (and other similar
branches at the noted ranges) to set either error:
"i18n.provider.endpoint_conflict" / "i18n.provider.not_found" (or another agreed
key) or omit error and rely on ERROR_CODES.CONFLICT / ERROR_CODES.NOT_FOUND
being returned; ensure the same change is applied for the other occurrences
referenced (around lines 554-567, 749-754, 818-823) so no hardcoded display
strings remain.

In `@src/app/`[locale]/dashboard/users/users-page-client.tsx:
- Around line 80-89: The current debouncing keys pendingTagFiltersKey and
pendingKeyGroupFiltersKey build strings via sort().join("|"), which can collide
if tag values contain "|"—update the key generation to use a collision-safe
encoding (e.g., sort the arrays and then use JSON.stringify on the sorted
arrays, or join with a null character "\0", or escape values with
encodeURIComponent) before passing into useDebounce; change the expressions that
compute pendingTagFiltersKey and pendingKeyGroupFiltersKey (which derive from
pendingTagFilters and pendingKeyGroupFilters) accordingly so useDebounce
receives a stable, unambiguous key.

In `@src/lib/migrate.ts`:
- Line 77: The code currently uses the internal helper readMigrationFiles
(referenced as readMigrationFiles) from drizzle-orm which is undocumented and
unstable; replace its usage with the documented driver-specific migrate API
(e.g., migrate from drizzle-orm/postgres-js/migrator) by wiring your DB client
into migrate(db, config) and using its returned results instead of
readMigrationFiles output (drop reliance on hash/folderMillis from
readMigrationFiles and map to the migrate result shape); update imports to use
the official migrator (migrate) and adjust any call sites in this module (e.g.,
where migrations is used) to the migrate() promise/result.

In `@src/lib/provider-endpoints/probe-scheduler.ts`:
- Around line 32-39: The code only applies the BASE_INTERVAL_MS upper bound when
using the default, but allows the env var
ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS to exceed BASE_INTERVAL_MS; update the
IDLE_DB_POLL_INTERVAL_MS computation to parse the env var via
parseIntWithDefault and then clamp the resulting value between 1 and
BASE_INTERVAL_MS (e.g., use Math.max(1, Math.min(parsedValue,
BASE_INTERVAL_MS))) so the effective poll interval respects the comment; update
references to DEFAULT_IDLE_DB_POLL_INTERVAL_MS, IDLE_DB_POLL_INTERVAL_MS,
BASE_INTERVAL_MS, parseIntWithDefault, and
process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS accordingly.

In `@src/repository/leaderboard.ts`:
- Around line 163-170: buildDateCondition currently interpolates
dateRange.startDate/endDate directly into SQL which allows invalid strings (e.g.
"not-a-date") to reach PostgreSQL and cause runtime errors; add defensive
validation of DateRangeParams (ensure YYYY-MM-DD format and valid calendar date)
before building SQL (either inside buildDateCondition or at the caller
findCustomRangeLeaderboard) and reject/throw a clear validation error for
invalid inputs. Use a strict YYYY-MM-DD regex and/or Date parsing to confirm the
values are valid dates, reference the symbols dateRange.startDate,
dateRange.endDate, buildDateCondition, findCustomRangeLeaderboard,
messageRequest.createdAt and timezone when implementing the check, and only
construct the SQL when validation passes.

In `@src/repository/overview.ts`:
- Around line 143-144: The yesterday query uses a closed interval (gte + lte)
while the today query uses a half-open interval (gte + lt), causing asymmetric
comparison; update the yesterday time-window condition that currently uses
lte(messageRequest.createdAt, yesterdayEnd) to use lt(messageRequest.createdAt,
yesterdayEnd) so both windows use gte + lt for messageRequest.createdAt (refer
to the existing today window using todayStart/tomorrowStart for consistency).

In `@src/repository/provider-endpoints-batch.ts`:
- Around line 68-173: The function findProviderEndpointProbeLogsBatch allows
input.limitPerEndpoint to be NaN which makes Math.max(1, NaN) return NaN and
produces an invalid SQL LIMIT; fix by validating and coercing
input.limitPerEndpoint to a safe positive integer before use (e.g. check
Number.isFinite(input.limitPerEndpoint), fallback to 1, use
Math.floor/Math.trunc to convert to an integer and enforce >=1), assign that
sanitized value to limitPerEndpoint and use it in the SQL and later checks so
LIMIT never receives NaN.

In `@src/repository/provider.ts`:
- Around line 748-752: 在 deleteProvider 的事务更新逻辑(tx.update(providers).set({
deletedAt: now }))中同时更新 updatedAt 字段以保持与批量删除分支行为一致:在 .set(...) 中加入 updatedAt:
now,并在需要审计返回值的地方(如 .returning(...))一并返回 providers.updatedAt 以确保审计时间一致可追溯。
🧹 Nitpick comments (27)
tests/unit/settings/providers/endpoint-latency-sparkline-ui.test.tsx (1)

92-95: 占位块选择器易碎,建议换成稳定标识。

目前依赖 bg-muted/* 类名做选择,样式调整会导致测试误报。建议为占位元素加 data-testid(或语义角色)并改为基于该标识查找。

src/app/api/availability/endpoints/route.ts (1)

22-47: 该 API 路由未使用 Hono 框架。

根据编码规范,src/app/api/**/*.{ts,tsx} 路径下的 API 路由应使用 Hono 框架。当前文件直接使用 NextRequest / NextResponse,属于已有代码,本次 PR 未涉及此部分重构,可考虑在后续迭代中迁移。As per coding guidelines: "API routes should use Hono framework and follow Next.js 16 App Router conventions".

src/repository/overview.ts (1)

44-48: 时区边界计算在两个函数中重复,建议提取为共享辅助函数。

getOverviewMetrics(第 44-48 行)和 getOverviewMetricsWithComparison(第 96-104 行)中 nowLocaltodayStartLocaltodayStarttomorrowStart 的计算逻辑完全相同。可以提取一个内部辅助函数返回这些 SQL 表达式,减少重复并确保后续修改时两处保持一致。

示例重构
function buildDayBoundaries(timezone: string) {
  const nowLocal = sql`CURRENT_TIMESTAMP AT TIME ZONE ${timezone}`;
  const todayStartLocal = sql`DATE_TRUNC('day', ${nowLocal})`;
  const todayStart = sql`(${todayStartLocal} AT TIME ZONE ${timezone})`;
  const tomorrowStart = sql`((${todayStartLocal} + INTERVAL '1 day') AT TIME ZONE ${timezone})`;
  return { nowLocal, todayStartLocal, todayStart, tomorrowStart };
}

Also applies to: 96-104

src/repository/leaderboard.ts (1)

175-197: 日历周期(daily/weekly/monthly)的日期窗口构建逻辑重复,可提取公共辅助函数。

三个分支的模式完全一致,仅 DATE_TRUNC 单位和 INTERVAL 值不同。可考虑提取为辅助函数减少重复。

♻️ 建议的重构方案
+function buildTruncatedWindowCondition(
+  unit: "day" | "week" | "month",
+  nowLocal: ReturnType<typeof sql>,
+  timezone: string,
+) {
+  const startLocal = sql`DATE_TRUNC(${unit}, ${nowLocal})`;
+  const endExclusiveLocal = sql`${startLocal} + INTERVAL ${`1 ${unit}`}`;
+  const start = sql`(${startLocal} AT TIME ZONE ${timezone})`;
+  const endExclusive = sql`(${endExclusiveLocal} AT TIME ZONE ${timezone})`;
+  return sql`${messageRequest.createdAt} >= ${start} AND ${messageRequest.createdAt} < ${endExclusive}`;
+}
+
 // 然后在 switch 中:
-    case "daily": {
-      const startLocal = sql`DATE_TRUNC('day', ${nowLocal})`;
-      const endExclusiveLocal = sql`${startLocal} + INTERVAL '1 day'`;
-      const start = sql`(${startLocal} AT TIME ZONE ${timezone})`;
-      const endExclusive = sql`(${endExclusiveLocal} AT TIME ZONE ${timezone})`;
-      return sql`${messageRequest.createdAt} >= ${start} AND ${messageRequest.createdAt} < ${endExclusive}`;
-    }
-    case "weekly": {
-      ...
-    }
-    case "monthly": {
-      ...
-    }
+    case "daily":
+      return buildTruncatedWindowCondition("day", nowLocal, timezone);
+    case "weekly":
+      return buildTruncatedWindowCondition("week", nowLocal, timezone);
+    case "monthly":
+      return buildTruncatedWindowCondition("month", nowLocal, timezone);

注意: DATE_TRUNCINTERVAL 的参数是 SQL 关键字/字面量,需要确认 Drizzle 的 sql 模板对字符串参数的处理方式 —— PostgreSQL 的 DATE_TRUNC 第一个参数需要是字符串字面量,而 INTERVAL '1 day' 中的值也需要是字面量。如果 Drizzle 将这些参数化为 $1,PostgreSQL 可能不接受。实际重构时可能需要使用 sql.raw() 来内联这些值(注意此处值是硬编码常量,不存在注入风险)。

src/app/[locale]/dashboard/users/users-page-client.tsx (1)

201-221: 双向同步 effect 逻辑正确但认知复杂度较高

Lines 201-207 将 tagFilterspendingTagFilters 同步,Lines 209-221 通过去抖后的 key 将 pendingTagFilterstagFilters 反向应用。两组 effect 加上 handleTagCommit / handleApplyFilters 共同构成了三条写入 tagFilters 的路径,理解和维护成本偏高。

可考虑后续将"待定 → 已应用"的同步逻辑收敛到单一机制(如仅保留显式提交或仅保留去抖自动应用),降低隐式状态流转的复杂度。

src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (1)

722-722: 建议移除冗余的默认导出。

第 644 行已有命名导出 export function ProviderForm,此处的 export default 是冗余的。根据编码规范,应优先使用命名导出。

建议修改
-export default ProviderForm;

As per coding guidelines, **/*.{ts,tsx}: Prefer named exports over default exports.

src/app/[locale]/dashboard/availability/_components/endpoint/probe-grid.tsx (1)

107-108: Tooltip 中的显示名称与卡片标题不一致。

第 129 行使用 displayName(label → hostname → url),但第 176 行 Tooltip 仍使用 endpoint.label || endpoint.url(跳过了 hostname 回退)。当 label 为空时,卡片标题显示 hostname,Tooltip 却显示完整 URL,行为不统一。

如果是有意为之(Tooltip 展示更完整的信息),可以忽略。否则建议统一:

建议修改
-                  <p className="font-medium">{endpoint.label || endpoint.url}</p>
+                  <p className="font-medium">{displayName}</p>

Also applies to: 129-129, 176-176

src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (2)

438-441: 翻译 key keyLoading 语义不准确。

此处加载的是端点数据,但使用了 t("keyLoading"),语义上 "key" 指密钥加载,与当前上下文(端点列表加载中)不符。建议使用更贴切的 key,如 endpointStatus.loading 或新增专用 key。


248-294: 降级路径的并发控制考虑周全,但建议添加日志。

batchGetVendorTypeEndpointStats 抛异常时,降级为逐个查询(限并发 8)。建议在 catch 块中添加 console.warnlogger.warn 日志,便于排查降级原因。当前的 catch {} 静默吞掉了异常信息。

建议修改
-          } catch {
-            // 降级路径:batch action 异常时按 vendorId 逐个查询。为避免 chunk 较大时触发请求风暴,这里限制并发。
+          } catch (batchError) {
+            // 降级路径:batch action 异常时按 vendorId 逐个查询。为避免 chunk 较大时触发请求风暴,这里限制并发。
+            console.warn("[VendorStatsBatcher] batch fetch failed, falling back to per-vendor queries", batchError);
src/lib/migrate.ts (1)

68-134: repairDrizzleMigrationsCreatedAt 逐行 UPDATE 在大多数场景下没有问题,但可考虑批量优化。

修复通常只涉及少量行,逐条 UPDATE(lines 123-129)可以接受。如果未来 journal 条目规模增长显著,可改用单条 UPDATE ... FROM (VALUES ...) 减少往返。目前不阻塞合并。

src/instrumentation.ts (1)

413-442: 开发环境 backfill 未使用 advisory lock,与生产环境不一致。

开发环境通常是单实例运行,此处不加锁不会导致实际问题。但如果后续开发环境也启用多实例调试(如 Docker Compose),可能出现重复 backfill。建议保持一致或在注释中说明此差异的原因。

src/lib/hooks/use-in-view-once.ts (1)

159-179: 阈值的"序列化→反序列化"路径存在微小精度风险。

threshold 先转为字符串 thresholdKey(line 159-164),再在 useMemo 中通过 parseFloat 转回数值(line 170-178)。对于实际场景中的常见阈值(0、0.5、1 等)不会有问题,但如果传入高精度浮点数(如 0.1 + 0.2),parseFloat("0.30000000000000004") 的结果可能与原始值微有差异。

当前使用场景下风险极低,仅作提示。

src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (1)

127-144: 模块级可变状态 isBatchProbeLogsEndpointAvailablebatchProbeLogsEndpointDisabledAt 缺少并发安全说明。

虽然浏览器 JS 是单线程的,这些变量在所有组件实例间共享是合理的。但 isBatchProbeLogsDisabled() 中 lines 134-136 的防御逻辑(disabledAt 为 null 时重置 available)暗示过去可能出现过不一致状态。建议添加简短注释说明这些变量的预期生命周期和重置条件。

src/app/[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx (1)

258-304: Focus/visibility 刷新的节流逻辑合理。

focusvisibilitychange 在切回标签页时常同时触发,2 秒节流避免了双倍请求。silent: true 刷新 vendor 不会闪烁 loading 态——体验友好。

一个小建议:refreshThrottled 内的 void refresh() 没有 catch,如果 refresh() 中的某个 awaited 调用抛出未预期异常,会产生 unhandled promise rejection。可在 refresh 末尾或调用处加 .catch

建议在 refresh 调用处捕获异常
     const refreshThrottled = () => {
       const now = Date.now();
       if (now - lastFocusRefreshAtRef.current < 2000) return;
       lastFocusRefreshAtRef.current = now;
-      void refresh();
+      void refresh().catch((err) => {
+        console.error("[EndpointTab] Background refresh failed:", err);
+      });
     };
src/repository/statistics.ts (1)

1079-1140: sumUserQuotaCostssumKeyQuotaCostsById 逻辑高度重复,建议提取共享查询构建器。

两个函数仅在 WHERE 条件上不同(userId vs key),但 scanStart/scanEnd 计算、FILTER 子句构造、结果映射完全相同。可提取一个内部 helper 接收不同的 filter condition,消除约 60 行重复代码。

当前不阻塞合并,作为后续优化建议。

示例:提取共享 helper
// 内部 helper
async function sumQuotaCostsInternal(
  filterCondition: SQL,
  ranges: QuotaCostRanges,
  maxAgeDays: number,
): Promise<QuotaCostSummary> {
  // ... shared scanStart/scanEnd/costTotal/query logic ...
  // WHERE 子句使用 filterCondition 替代具体的 eq(userId) 或 eq(key)
}

export async function sumUserQuotaCosts(userId: number, ranges: QuotaCostRanges, maxAgeDays = 365) {
  return sumQuotaCostsInternal(eq(messageRequest.userId, userId), ranges, maxAgeDays);
}

export async function sumKeyQuotaCostsById(keyId: number, ranges: QuotaCostRanges, maxAgeDays = 365) {
  const keyString = await getKeyStringByIdCached(keyId);
  if (!keyString) return { cost5h: 0, costDaily: 0, costWeekly: 0, costMonthly: 0, costTotal: 0 };
  return sumQuotaCostsInternal(eq(messageRequest.key, keyString), ranges, maxAgeDays);
}

Also applies to: 1145-1211

src/drizzle/schema.ts (2)

323-329: provider_vendor_id IS NOT NULL 条件冗余

providerVendorId 列定义为 .notNull()(第 160-161 行),因此索引 WHERE 子句中的 provider_vendor_id IS NOT NULL 永远为 true,属于冗余条件。虽然不影响正确性,但会让后续维护者产生困惑(以为该列可能为 NULL)。

建议移除冗余条件
   providersEnabledVendorTypeIdx: index('idx_providers_enabled_vendor_type').on(
     table.providerVendorId,
     table.providerType
   ).where(
-    sql`${table.deletedAt} IS NULL AND ${table.isEnabled} = true AND ${table.providerVendorId} IS NOT NULL AND ${table.providerVendorId} > 0`
+    sql`${table.deletedAt} IS NULL AND ${table.isEnabled} = true AND ${table.providerVendorId} > 0`
   ),

477-524: message_request 表索引数量较多,注意写入性能影响

此表现已有约 15 个索引(含本次新增的 6 个)。作为高写入表(注释中已提到),每个索引都会增加 INSERT/UPDATE 的开销。建议:

  1. 定期通过 pg_stat_user_indexes 监控各索引的实际使用情况(idx_scan 计数),清理未被查询命中的冗余索引。
  2. 注意 messageRequestKeyIdx(第 495 行,单列 key)与 messageRequestKeyCreatedAtIdIdx(第 497-501 行,key + created_at + id)之间存在前缀重叠——后者的复合索引已覆盖纯 key 等值查询,前者可能成为冗余索引。
src/lib/endpoint-circuit-breaker.ts (1)

200-268: 批量 Redis 同步的 in-flight 去重设计良好,但存在重复去重

syncHealthFromRedisBatch 内部对 endpointIds 做了 new Set() 去重(第 201 行),而调用方 getAllEndpointHealthStatusAsync 在第 280 行已经做过一次去重,且 needsRefresh 是从去重后的数组 filter 出来的。虽然不影响正确性,但属于冗余操作。

可选:信任调用方已去重
-async function syncHealthFromRedisBatch(endpointIds: readonly number[], refreshNow: number) {
-  const uniqueEndpointIds = Array.from(new Set(endpointIds));
-  const toLoad: number[] = [];
+async function syncHealthFromRedisBatch(endpointIds: readonly number[], refreshNow: number) {
+  const toLoad: number[] = [];
   const waitPromises: Promise<void>[] = [];

-  for (const endpointId of uniqueEndpointIds) {
+  for (const endpointId of endpointIds) {
src/lib/provider-endpoints/probe.ts (1)

4-4: getEnvConfig 此处使用静态导入,与文件中其他模块一致

注意到 endpoint-circuit-breaker.ts 中使用的是动态 await import("@/lib/config/env.schema"),而此处是静态导入。两种方式在此场景下均可正常工作,但如果项目有统一的导入风格偏好,建议保持一致。

src/app/[locale]/dashboard/availability/_components/availability-dashboard.tsx (1)

124-124: 代码注释建议使用英文,便于国际化协作。

当前注释为中文。考虑到项目支持 5 种语言(zh-CN, zh-TW, en, ja, ru),建议将代码注释统一为英文,以便更广泛的贡献者理解代码意图。

建议修改
-  // 当页面从后台回到前台时,做一次节流刷新,避免看到陈旧数据;同时配合 visibility 判断减少后台请求。
+  // Throttled refresh when tab regains focus/visibility to avoid stale data; skips background requests.
src/lib/redis/client.ts (1)

98-114: getRedisClient 内部与 buildRedisOptionsForUrl 存在配置重复。

getRedisClient(Lines 98-114)和 buildRedisOptionsForUrl(Lines 45-74)各自独立定义了 retryStrategyenableOfflineQueuemaxRetriesPerRequest 等相同的配置。可以考虑在 getRedisClient 内部复用 buildRedisOptionsForUrl 以消除重复。

这不是本次变更引入的问题,可作为后续重构处理。

建议修改
   try {
-    const useTls = redisUrl.startsWith("rediss://");
-
-    // 1. 定义基础配置
-    const redisOptions: RedisOptions = {
-      enableOfflineQueue: false,
-      maxRetriesPerRequest: 3,
-      retryStrategy(times) {
-        if (times > 5) {
-          logger.error("[Redis] Max retries reached, giving up");
-          return null;
-        }
-        const delay = Math.min(times * 200, 2000);
-        logger.warn(`[Redis] Retry ${times}/5 after ${delay}ms`);
-        return delay;
-      },
-    };
-
-    // 2. 如果使用 rediss://,则添加显式的 TLS 配置
-    if (useTls) {
-      const raw = process.env.REDIS_TLS_REJECT_UNAUTHORIZED?.trim();
-      const rejectUnauthorized = raw !== "false" && raw !== "0";
-      logger.info("[Redis] Using TLS connection (rediss://)", {
-        redisUrl: safeRedisUrl,
-        rejectUnauthorized,
-      });
-      redisOptions.tls = buildTlsConfig(redisUrl);
-    }
+    const { isTLS, options: redisOptions } = buildRedisOptionsForUrl(redisUrl);
+
+    if (isTLS) {
+      logger.info("[Redis] Using TLS connection (rediss://)", {
+        redisUrl: safeRedisUrl,
+      });
+    }
 
-    redisClient = new Redis(redisUrl, redisOptions);
+    redisClient = new Redis(redisUrl, redisOptions as RedisOptions);

Also applies to: 45-74

tests/unit/actions/my-usage-token-aggregation.test.ts (1)

151-151: 选择数量断言放宽至 >= 1,与 getMyTodayStats 的查询合并一致。

toBeGreaterThanOrEqual(2) 降为 toBeGreaterThanOrEqual(1),反映了查询优化。但与 getMyStatsSummary 测试中使用的精确断言 toHaveLength(1) 风格不同。如果 getMyTodayStats 的查询数量也已确定,建议使用精确断言以避免测试在回归时过于宽松。

src/lib/provider-endpoints/probe-log-cleanup.ts (1)

112-115: 丢失领导锁时未记录已删除的行数。

leadershipLost 为 true 时,return 直接跳到 finally,跳过了 line 133-138 的 totalDeleted 日志。如果清理过程中途失去锁,已完成的删除量不会被记录,不利于运维排查。

建议在 return 前记录已删除的行数
       if (leadershipLost) {
+        if (totalDeleted > 0) {
+          logger.info("[EndpointProbeLogCleanup] Partial cleanup before leadership lost", {
+            retentionDays: RETENTION_DAYS,
+            totalDeleted,
+          });
+        }
         return;
       }
tests/unit/repository/statistics-quota-costs-all-time.test.ts (1)

100-131: sumKeyQuotaCostsById 测试中 capturedSelectFields 的捕获时机需确认。

capturedSelectFields = fieldswhere mock 内部(line 116)赋值,但 fields 来自外层 select mock 的闭包参数。第二次 select() 调用时 currentCallIndex === 2,此时 fields 确实是 cost 查询的 select 字段。不过这种依赖闭包捕获 + 调用次序的测试模式较脆弱——如果 sumKeyQuotaCostsById 内部实现调整了查询顺序,测试会静默失败或给出误导性结果。

考虑在未来重构时将 capturedSelectFields 赋值移到 select mock 层级,配合更明确的断言来降低脆弱性。

src/actions/my-usage.ts (3)

249-290: 配额查询重构合理,并行化良好。

sumKeyQuotaCostsByIdsumUserQuotaCosts 与 session 计数并行执行是正确的优化方向。解构赋值清晰地映射了各周期费用字段。

需注意 ALL_TIME_MAX_AGE_DAYS = Infinity 会导致 costTotal 扫描该 key/user 的全部历史数据(Number.isFinite(Infinity)false,故 cutoffDatenull)。对于长期活跃的高频 key,这个全量扫描可能在数据量增长后成为瓶颈。建议后续考虑是否需要为 costTotal 设置合理的上限天数,或引入物化视图/缓存层来加速全量聚合。


386-404: .map() 内部通过副作用累加 totals 不够清晰。

当前在 .map() 回调中同时执行了转换和累加操作,.map() 语义上应为纯转换。建议拆分为 reduce 或先 forEach 累加再 .map() 转换,使意图更明确。

可选重构:分离累加与映射
-    let totalCalls = 0;
-    let totalInputTokens = 0;
-    let totalOutputTokens = 0;
-    let totalCostUsd = 0;
-
-    const modelBreakdown = breakdown.map((row) => {
-      const billingModel = billingModelSource === "original" ? row.originalModel : row.model;
-      const rawCostUsd = Number(row.costUsd ?? 0);
-      const costUsd = Number.isFinite(rawCostUsd) ? rawCostUsd : 0;
-
-      totalCalls += row.calls ?? 0;
-      totalInputTokens += row.inputTokens ?? 0;
-      totalOutputTokens += row.outputTokens ?? 0;
-      totalCostUsd += costUsd;
-
-      return {
-        model: row.model,
-        billingModel,
-        calls: row.calls,
-        costUsd,
-        inputTokens: row.inputTokens,
-        outputTokens: row.outputTokens,
-      };
-    });
+    const modelBreakdown = breakdown.map((row) => {
+      const billingModel = billingModelSource === "original" ? row.originalModel : row.model;
+      const rawCostUsd = Number(row.costUsd ?? 0);
+      const costUsd = Number.isFinite(rawCostUsd) ? rawCostUsd : 0;
+      return {
+        model: row.model,
+        billingModel,
+        calls: row.calls,
+        costUsd,
+        inputTokens: row.inputTokens,
+        outputTokens: row.outputTokens,
+      };
+    });
+
+    const totalCalls = modelBreakdown.reduce((sum, r) => sum + (r.calls ?? 0), 0);
+    const totalInputTokens = modelBreakdown.reduce((sum, r) => sum + (r.inputTokens ?? 0), 0);
+    const totalOutputTokens = modelBreakdown.reduce((sum, r) => sum + (r.outputTokens ?? 0), 0);
+    const totalCostUsd = modelBreakdown.reduce((sum, r) => sum + r.costUsd, 0);

685-697: keyModelBreakdown 的二次排序是必要的。

数据库查询按 sum(costUsd) DESC(即 user 维度的总 cost)排序,但 keyOnlyBreakdown 经过过滤后需要按 key 维度的 cost 重新排序,此处的 .sort((a, b) => b.cost - a.cost) 是正确的。

有一个小注意点:如果 Number(row.keyCost) 在极端情况下返回 NaNsort 比较会不稳定。当前通过 COALESCE(..., 0) 确保 DB 返回合法数值字符串,实际风险极低。如需额外防御,可在 cost 赋值时加 Number.isFinite 守卫(类似 summaryAcc 中 line 644 的处理方式)。

可选:为 breakdown cost 添加 NaN 守卫
       keyModelBreakdown: keyOnlyBreakdown
         .map((row) => ({
           model: row.model,
           requests: row.keyRequests,
-          cost: Number(row.keyCost ?? 0),
+          cost: (() => { const c = Number(row.keyCost ?? 0); return Number.isFinite(c) ? c : 0; })(),
           inputTokens: row.keyInputTokens,

同理 userModelBreakdown 中 line 701 的 cost 字段也可做相同处理。

Comment on lines +416 to +430
if (isDirectEndpointEditConflictError(error)) {
return {
ok: false,
error: "端点 URL 与同供应商类型下的其他端点冲突",
errorCode: ERROR_CODES.CONFLICT,
};
}

if (isForeignKeyViolationError(error)) {
return {
ok: false,
error: "供应商不存在",
errorCode: ERROR_CODES.NOT_FOUND,
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

新增用户提示文案仍为硬编码中文。
这些返回值属于用户可见文案,建议改为 i18n key 或仅返回 errorCode 由前端统一翻译。

As per coding guidelines "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

Also applies to: 554-567, 749-754, 818-823

🤖 Prompt for AI Agents
In `@src/actions/provider-endpoints.ts` around lines 416 - 430, The user-facing
error strings in provider-endpoints handlers (e.g., the branch using
isDirectEndpointEditConflictError and isForeignKeyViolationError) are hardcoded
Chinese; replace them with i18n keys or remove the message and return only
errorCode so the frontend handles localization. Update the return objects in
functions/methods that use isDirectEndpointEditConflictError,
isForeignKeyViolationError (and other similar branches at the noted ranges) to
set either error: "i18n.provider.endpoint_conflict" / "i18n.provider.not_found"
(or another agreed key) or omit error and rely on ERROR_CODES.CONFLICT /
ERROR_CODES.NOT_FOUND being returned; ensure the same change is applied for the
other occurrences referenced (around lines 554-567, 749-754, 818-823) so no
hardcoded display strings remain.

Comment on lines +80 to +89
const pendingTagFiltersKey = useMemo(
() => pendingTagFilters.slice().sort().join("|"),
[pendingTagFilters]
);
const pendingKeyGroupFiltersKey = useMemo(
() => pendingKeyGroupFilters.slice().sort().join("|"),
[pendingKeyGroupFilters]
);
const debouncedPendingTagsKey = useDebounce(pendingTagFiltersKey, 300);
const debouncedPendingKeyGroupsKey = useDebounce(pendingKeyGroupFiltersKey, 300);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

基于字符串键的去抖策略清晰合理

将数组通过 sort().join("|") 转为稳定字符串再去抖,避免了数组引用不稳定导致的多余触发,设计良好。

⚠️ 极端情况提示:如果 tag 值本身包含 "|" 字符,排序+拼接后的 key 可能产生碰撞(如 ["a|b", "c"]["a", "b|c"] 会生成相同的 key)。若 tag 值受限于用户输入,建议在创建/校验时禁止使用该字符,或换用更安全的分隔符(如 \0)。

🤖 Prompt for AI Agents
In `@src/app/`[locale]/dashboard/users/users-page-client.tsx around lines 80 - 89,
The current debouncing keys pendingTagFiltersKey and pendingKeyGroupFiltersKey
build strings via sort().join("|"), which can collide if tag values contain
"|"—update the key generation to use a collision-safe encoding (e.g., sort the
arrays and then use JSON.stringify on the sorted arrays, or join with a null
character "\0", or escape values with encodeURIComponent) before passing into
useDebounce; change the expressions that compute pendingTagFiltersKey and
pendingKeyGroupFiltersKey (which derive from pendingTagFilters and
pendingKeyGroupFilters) accordingly so useDebounce receives a stable,
unambiguous key.

// drizzle-orm migrator 仅比较 `created_at(folderMillis)` 来决定是否执行迁移。
// 若历史 journal 的 `when` 被修正(或曾出现非单调),旧实例可能会因为 `created_at` 偏大而永久跳过后续迁移。
// 这里用 hash 对齐并修复 created_at,让升级对用户无感(Docker 拉新镜像重启即可)。
const migrations = readMigrationFiles({ migrationsFolder });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check the drizzle-orm version
if [ -f "package.json" ]; then
  echo "=== Package.json drizzle-orm version ==="
  cat package.json | jq '.dependencies."drizzle-orm" // .devDependencies."drizzle-orm"' 2>/dev/null || grep -i "drizzle-orm" package.json
fi

# Check if node_modules exists and drizzle-orm is installed
if [ -d "node_modules/drizzle-orm" ]; then
  echo -e "\n=== Drizzle-orm installed, checking migrator ==="
  # Look for migrator files
  find node_modules/drizzle-orm -name "*migrator*" -type f | head -20
fi

Repository: ding113/claude-code-hub

Length of output: 115


🏁 Script executed:

#!/bin/bash
# Search for readMigrationFiles in drizzle-orm
echo "=== Searching for readMigrationFiles ==="
if [ -d "node_modules/drizzle-orm" ]; then
  rg -l 'readMigrationFiles' node_modules/drizzle-orm/ 2>/dev/null | head -10
  
  echo -e "\n=== Checking migrator.d.ts ==="
  if [ -f "node_modules/drizzle-orm/migrator.d.ts" ]; then
    cat node_modules/drizzle-orm/migrator.d.ts | head -100
  fi
fi

Repository: ding113/claude-code-hub

Length of output: 107


🏁 Script executed:

#!/bin/bash
# Check drizzle-orm package.json for version
echo "=== Drizzle-orm package version ==="
if [ -f "node_modules/drizzle-orm/package.json" ]; then
  cat node_modules/drizzle-orm/package.json | jq '.version' 2>/dev/null || grep -i "\"version\"" node_modules/drizzle-orm/package.json
fi

Repository: ding113/claude-code-hub

Length of output: 102


🏁 Script executed:

# Check the migrate.ts file to see how readMigrationFiles is used
cat -n src/lib/migrate.ts | head -100

Repository: ding113/claude-code-hub

Length of output: 3689


🏁 Script executed:

# Check imports in the file
rg -n 'import.*readMigrationFiles|from.*migrator' src/

Repository: ding113/claude-code-hub

Length of output: 227


🌐 Web query:

drizzle-orm 0.44.0 readMigrationFiles API migrator documentation

💡 Result:

readMigrationFiles isn’t documented as a first-class public API in the Drizzle docs; it’s an internal helper used by the driver-specific migrate() implementations.

Where it lives / how it’s used (v0.44.x)

  • Driver migrators (example: drizzle-orm/node-postgres/migrator) call readMigrationFiles(config) and then delegate to db.dialect.migrate(...). This is shown in Drizzle’s GitHub discussion about programmatic migrations. [2]
  • Example implementation detail (from a published drizzle-orm build): a migrator reads migrations via readMigrationFiles(config), determines the migrations table name (default __drizzle_migrations, overridable via config.migrationsTable), and applies any migrations newer than the last recorded created_at/timestamp. [3]

The “official” docs to reference

For migrations, the official documentation is the Drizzle Kit command docs (CLI), which describes the same underlying flow: read .sql files from the migrations folder, check the migrations log table (__drizzle_migrations), run pending migrations, and log them. [1]

Practical takeaway

  • If you want a supported API surface, use the driver-specific migrate(db, { migrationsFolder, ... }) entrypoint (as in the GitHub discussion). [2]
  • If you still want readMigrationFiles, import it from drizzle-orm/migrator (it’s what the migrators use), but treat it as internal/unstable (no dedicated docs page). [2]

Sources: Drizzle Kit migrate docs [1], Drizzle ORM discussion showing readMigrationFiles used under the hood [2], published build source map showing readMigrationFiles + migrations table behavior [3].


readMigrationFiles 在 drizzle-orm 0.44.x 中存在,但为内部非文档化 API。

已确认 readMigrationFilesdrizzle-orm/migrator 导出,并返回包含 hashfolderMillis 属性的对象(第 81 行验证)。

但需要注意的是,根据 Drizzle 官方文档,readMigrationFiles 并非公开的公共 API,而是驱动程序特定迁移器使用的内部辅助函数,缺乏文档支持,在未来版本中可能发生变化。建议改用官方推荐的驱动程序特定的 migrate() 函数替代(如 drizzle-orm/postgres-js/migrator 中的 migrate(db, config) 形式)。

🤖 Prompt for AI Agents
In `@src/lib/migrate.ts` at line 77, The code currently uses the internal helper
readMigrationFiles (referenced as readMigrationFiles) from drizzle-orm which is
undocumented and unstable; replace its usage with the documented driver-specific
migrate API (e.g., migrate from drizzle-orm/postgres-js/migrator) by wiring your
DB client into migrate(db, config) and using its returned results instead of
readMigrationFiles output (drop reliance on hash/folderMillis from
readMigrationFiles and map to the migrate result shape); update imports to use
the official migrator (migrate) and adjust any call sites in this module (e.g.,
where migrations is used) to the migrate() promise/result.

Comment on lines 32 to 39
// Max idle DB polling interval (default 30s, bounded by base interval)
const DEFAULT_IDLE_DB_POLL_INTERVAL_MS = Math.min(BASE_INTERVAL_MS, 30_000);
const IDLE_DB_POLL_INTERVAL_MS = Math.max(
1,
parseIntWithDefault(
process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS,
DEFAULT_IDLE_DB_POLL_INTERVAL_MS
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

IDLE 轮询间隔未按注释上限裁剪。
当前仅对默认值做了上限限制,环境变量可设为大于 BASE_INTERVAL_MS,与“bounded by base interval”的注释不一致。建议显式 clamp。

建议修正
-const IDLE_DB_POLL_INTERVAL_MS = Math.max(
-  1,
-  parseIntWithDefault(
-    process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS,
-    DEFAULT_IDLE_DB_POLL_INTERVAL_MS
-  )
-);
+const IDLE_DB_POLL_INTERVAL_MS = Math.min(
+  BASE_INTERVAL_MS,
+  Math.max(
+    1,
+    parseIntWithDefault(
+      process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS,
+      DEFAULT_IDLE_DB_POLL_INTERVAL_MS
+    )
+  )
+);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Max idle DB polling interval (default 30s, bounded by base interval)
const DEFAULT_IDLE_DB_POLL_INTERVAL_MS = Math.min(BASE_INTERVAL_MS, 30_000);
const IDLE_DB_POLL_INTERVAL_MS = Math.max(
1,
parseIntWithDefault(
process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS,
DEFAULT_IDLE_DB_POLL_INTERVAL_MS
)
// Max idle DB polling interval (default 30s, bounded by base interval)
const DEFAULT_IDLE_DB_POLL_INTERVAL_MS = Math.min(BASE_INTERVAL_MS, 30_000);
const IDLE_DB_POLL_INTERVAL_MS = Math.min(
BASE_INTERVAL_MS,
Math.max(
1,
parseIntWithDefault(
process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS,
DEFAULT_IDLE_DB_POLL_INTERVAL_MS
)
)
);
🤖 Prompt for AI Agents
In `@src/lib/provider-endpoints/probe-scheduler.ts` around lines 32 - 39, The code
only applies the BASE_INTERVAL_MS upper bound when using the default, but allows
the env var ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS to exceed BASE_INTERVAL_MS;
update the IDLE_DB_POLL_INTERVAL_MS computation to parse the env var via
parseIntWithDefault and then clamp the resulting value between 1 and
BASE_INTERVAL_MS (e.g., use Math.max(1, Math.min(parsedValue,
BASE_INTERVAL_MS))) so the effective poll interval respects the comment; update
references to DEFAULT_IDLE_DB_POLL_INTERVAL_MS, IDLE_DB_POLL_INTERVAL_MS,
BASE_INTERVAL_MS, parseIntWithDefault, and
process.env.ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS accordingly.

Comment on lines 163 to 170
if (period === "custom" && dateRange) {
// 自定义日期范围:startDate <= date <= endDate
return sql`(${messageRequest.createdAt} AT TIME ZONE ${timezone})::date >= ${dateRange.startDate}::date
AND (${messageRequest.createdAt} AT TIME ZONE ${timezone})::date <= ${dateRange.endDate}::date`;
// 自定义日期范围:startDate <= local_date <= endDate
const startLocal = sql`(${dateRange.startDate}::date)::timestamp`;
const endExclusiveLocal = sql`(${dateRange.endDate}::date + INTERVAL '1 day')`;
const start = sql`(${startLocal} AT TIME ZONE ${timezone})`;
const endExclusive = sql`(${endExclusiveLocal} AT TIME ZONE ${timezone})`;
return sql`${messageRequest.createdAt} >= ${start} AND ${messageRequest.createdAt} < ${endExclusive}`;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

自定义日期范围的输入未做格式校验,无效的 startDate/endDate 会导致 PostgreSQL 运行时错误。

DateRangeParams 接口约定格式为 YYYY-MM-DD,但 buildDateCondition 没有在 SQL 执行前验证输入格式。如果传入非法字符串(如 "not-a-date"),::date 转换会抛出 PostgreSQL 错误,错误信息可能暴露给调用方。

建议在进入 SQL 构建前做基础格式校验,或在上游调用方(如 findCustomRangeLeaderboard)统一校验。

🛡️ 建议的防御性校验
+const DATE_REGEX = /^\d{4}-\d{2}-\d{2}$/;
+
 if (period === "custom" && dateRange) {
+  if (!DATE_REGEX.test(dateRange.startDate) || !DATE_REGEX.test(dateRange.endDate)) {
+    throw new Error("Invalid date format. Expected YYYY-MM-DD.");
+  }
   const startLocal = sql`(${dateRange.startDate}::date)::timestamp`;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (period === "custom" && dateRange) {
// 自定义日期范围:startDate <= date <= endDate
return sql`(${messageRequest.createdAt} AT TIME ZONE ${timezone})::date >= ${dateRange.startDate}::date
AND (${messageRequest.createdAt} AT TIME ZONE ${timezone})::date <= ${dateRange.endDate}::date`;
// 自定义日期范围:startDate <= local_date <= endDate
const startLocal = sql`(${dateRange.startDate}::date)::timestamp`;
const endExclusiveLocal = sql`(${dateRange.endDate}::date + INTERVAL '1 day')`;
const start = sql`(${startLocal} AT TIME ZONE ${timezone})`;
const endExclusive = sql`(${endExclusiveLocal} AT TIME ZONE ${timezone})`;
return sql`${messageRequest.createdAt} >= ${start} AND ${messageRequest.createdAt} < ${endExclusive}`;
}
const DATE_REGEX = /^\d{4}-\d{2}-\d{2}$/;
if (period === "custom" && dateRange) {
if (
!DATE_REGEX.test(dateRange.startDate) ||
!DATE_REGEX.test(dateRange.endDate)
) {
throw new Error("Invalid date format. Expected YYYY-MM-DD.");
}
// 自定义日期范围:startDate <= local_date <= endDate
const startLocal = sql`(${dateRange.startDate}::date)::timestamp`;
const endExclusiveLocal = sql`(${dateRange.endDate}::date + INTERVAL '1 day')`;
const start = sql`(${startLocal} AT TIME ZONE ${timezone})`;
const endExclusive = sql`(${endExclusiveLocal} AT TIME ZONE ${timezone})`;
return sql`${messageRequest.createdAt} >= ${start} AND ${messageRequest.createdAt} < ${endExclusive}`;
}
🤖 Prompt for AI Agents
In `@src/repository/leaderboard.ts` around lines 163 - 170, buildDateCondition
currently interpolates dateRange.startDate/endDate directly into SQL which
allows invalid strings (e.g. "not-a-date") to reach PostgreSQL and cause runtime
errors; add defensive validation of DateRangeParams (ensure YYYY-MM-DD format
and valid calendar date) before building SQL (either inside buildDateCondition
or at the caller findCustomRangeLeaderboard) and reject/throw a clear validation
error for invalid inputs. Use a strict YYYY-MM-DD regex and/or Date parsing to
confirm the values are valid dates, reference the symbols dateRange.startDate,
dateRange.endDate, buildDateCondition, findCustomRangeLeaderboard,
messageRequest.createdAt and timezone when implementing the check, and only
construct the SQL when validation passes.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This is a well-structured performance-focused refactoring that addresses N+1 queries, adds strategic caching, and improves database index utilization. The PR demonstrates careful engineering with proper error handling, LRU eviction, and defensive coding patterns.

PR Size: XL

  • Lines changed: ~26,000 (24,827 additions, 1,176 deletions)
  • Files changed: 72

Split Suggestion: For future PRs of this size, consider splitting into:

  1. Database migrations and schema changes
  2. Repository layer optimizations (caching, batch queries)
  3. Frontend optimizations (useInViewOnce, virtualized components)
  4. Circuit breaker and endpoint lifecycle changes

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 1 0 0
Security 0 0 0 0
Error Handling 0 1 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 1 0

Critical Issues (Must Fix)

None identified.

High Priority Issues (Should Fix)

1. Race condition in getProviderStatistics in-flight dedup

File: src/repository/provider.ts:1135-1224

The in-flight deduplication has a race condition. The providerStatisticsInFlight is assigned AFTER the promise starts executing (line 1224), leaving a window where concurrent callers can bypass the dedup check.

Current problematic flow:

// Line 1131-1133: Check if in-flight
if (providerStatisticsInFlight && providerStatisticsInFlight.timezone === timezone) {
  return await providerStatisticsInFlight.promise;
}

// Line 1135-1222: Create promise (executes immediately)
const promise = (async () => { ... })();

// Line 1224: Register as in-flight (TOO LATE)
providerStatisticsInFlight = { timezone, promise };

Fix: Register the in-flight promise BEFORE starting async work:

if (providerStatisticsInFlight && providerStatisticsInFlight.timezone === timezone) {
  return await providerStatisticsInFlight.promise;
}

// Create placeholder first
let resolve: (value: ProviderStatisticsRow[]) => void;
let reject: (reason: unknown) => void;
const promise = new Promise<ProviderStatisticsRow[]>((res, rej) => {
  resolve = res;
  reject = rej;
});

// Register immediately
providerStatisticsInFlight = { timezone, promise };

try {
  const result = await (async () => { /* query logic */ })();
  resolve(result);
  return result;
} catch (e) {
  reject(e);
  throw e;
}

Medium Priority Issues

2. In-place array mutation in endpoint-selector.ts

File: src/lib/provider-endpoints/endpoint-selector.ts:17

The rankActiveProviderEndpoints function mutates the input array with .sort(). Callers may not expect their array to be modified.

Fix: Clone before sorting:

function rankActiveProviderEndpoints(endpoints: ProviderEndpoint[]): ProviderEndpoint[] {
  if (endpoints.length <= 1) return endpoints;
  return [...endpoints].sort((a, b) => { /* ... */ });
}

Review Coverage

  • Logic and correctness - Race condition identified
  • Security (OWASP Top 10) - Clean
  • Error handling - Comprehensive with try/catch and logging
  • Type safety - Clean with proper TypeScript usage
  • Documentation accuracy - Comments match implementation
  • Test coverage - Adequate unit tests added
  • Code clarity - Generally good, minor mutation issue

Notable Positive Aspects

  1. Excellent database optimization: FILTER clauses, range-based date comparisons, new indexes
  2. Comprehensive caching: TTL + LRU eviction on all caches prevents memory leaks
  3. Multi-instance safety: Advisory locks for migrations and backfills
  4. DST-aware date handling: Proper timezone handling for statistics
  5. Batch API design: Efficient LATERAL joins for probe logs

Automated review by Claude AI

const delayMs = process.env.NODE_ENV === "test" ? 0 : 10;
this.flushTimer = setTimeout(() => {
this.flushTimer = null;
void this.flush().catch(() => {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [ERROR-SWALLOWED] Swallowed flush() rejection hides failures

src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:375

Why this is a problem: void this.flush().catch(() => {}); discards any rejection from flush(). If flush() ever rejects (unexpected runtime bug, refactor regression), this fails silently and makes diagnosis much harder.

Suggested fix:

this.flushTimer = setTimeout(() => {
  this.flushTimer = null;
  void this.flush().catch((error) => {
    console.error("[ProbeLogsBatcher] flush failed", error);
  });
}, delayMs);


this.flushTimer = setTimeout(() => {
this.flushTimer = null;
void this.flush().catch(() => {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [ERROR-SWALLOWED] Swallowed flush() rejection hides failures

src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx:127

Why this is a problem: void this.flush().catch(() => {}); ignores any rejection from flush(). If flush() ever rejects (unexpected runtime bug, refactor regression), this fails silently and makes diagnosis much harder.

Suggested fix:

this.flushTimer = setTimeout(() => {
  this.flushTimer = null;
  void this.flush().catch((error) => {
    console.error("[VendorTypeEndpointStatsBatcher] flush failed", error);
  });
}, 0);

}
}

export async function batchGetProviderEndpointProbeLogs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [TEST-MISSING-CRITICAL] New batch action lacks unit tests

src/actions/provider-endpoints.ts:708

Why this is a problem: CLAUDE.md requires: "Test Coverage - All new features must have unit test coverage of at least 80%". batchGetProviderEndpointProbeLogs introduces new behavior (admin gating, zod validation, de-dupe + ordering, empty-input fast path, per-endpoint limit) but tests/unit/actions/provider-endpoints.test.ts does not cover it.

Suggested fix (add to tests/unit/actions/provider-endpoints.test.ts):

it("batchGetProviderEndpointProbeLogs: dedupes ids and returns per-endpoint logs", async () => {
  getSessionMock.mockResolvedValue({ user: { role: "admin" } });
  findProviderEndpointProbeLogsBatchMock.mockResolvedValue(new Map([[2, []]]) as any);

  const { batchGetProviderEndpointProbeLogs } = await import("@/actions/provider-endpoints");
  const res = await batchGetProviderEndpointProbeLogs({ endpointIds: [2, 2], limit: 12 });

  expect(res).toEqual({ ok: true, data: [{ endpointId: 2, logs: [] }] });
  expect(findProviderEndpointProbeLogsBatchMock).toHaveBeenCalledWith({
    endpointIds: [2],
    limitPerEndpoint: 12,
  });
});

}
}

export async function batchGetVendorTypeEndpointStats(input: unknown): Promise<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [TEST-MISSING-CRITICAL] New batch action lacks unit tests

src/actions/provider-endpoints.ts:758

Why this is a problem: CLAUDE.md requires: "Test Coverage - All new features must have unit test coverage of at least 80%". batchGetVendorTypeEndpointStats has non-trivial semantics (admin gating, zod validation, de-dupe, input-order preservation, defaulting missing vendor rows to zeros) but tests/unit/actions/provider-endpoints.test.ts does not cover it.

Suggested fix (add to tests/unit/actions/provider-endpoints.test.ts):

it("batchGetVendorTypeEndpointStats: dedupes ids and defaults missing vendor rows", async () => {
  getSessionMock.mockResolvedValue({ user: { role: "admin" } });
  findVendorTypeEndpointStatsBatchMock.mockResolvedValue([
    { vendorId: 2, total: 3, enabled: 1, healthy: 1, unhealthy: 0, unknown: 0 },
  ] as any);

  const { batchGetVendorTypeEndpointStats } = await import("@/actions/provider-endpoints");
  const res = await batchGetVendorTypeEndpointStats({
    vendorIds: [2, 1, 2],
    providerType: "claude",
  });

  expect(findVendorTypeEndpointStatsBatchMock).toHaveBeenCalledWith({
    vendorIds: [2, 1],
    providerType: "claude",
  });
  expect(res.ok).toBe(true);
  expect(res.data).toEqual([
    { vendorId: 2, providerType: "claude", total: 3, enabled: 1, healthy: 1, unhealthy: 0, unknown: 0 },
    { vendorId: 1, providerType: "claude", total: 0, enabled: 0, healthy: 0, unhealthy: 0, unknown: 0 },
  ]);
});

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This is an XL PR with broad changes across provider management UI, new admin batch actions, repository query paths, Redis/circuit state syncing, and DB migrations. The scope makes regression risk and root-cause attribution high; splitting into smaller PRs would make review and rollback safer.

PR Size: XL

  • Lines changed: 26003
  • Files changed: 72
  • Split suggestions:
    • PR 1: DB migrations + src/lib/migrate.ts locking/repair logic
    • PR 2: Endpoint circuit breaker Redis batch-sync + selector/statistics repository changes
    • PR 3: Providers UI batching (provider-endpoint-hover, endpoint-latency-sparkline) + related actions/routes
    • PR 4: My Usage/statistics query optimizations + tests

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 2 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 2 0 0
Simplification 0 0 0 0

Critical Issues (Must Fix)

None.

High Priority Issues (Should Fix)

  • [HIGH] [ERROR-SWALLOWED] src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:375 swallows flush() rejection via .catch(() => {}).
  • [HIGH] [ERROR-SWALLOWED] src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx:127 swallows flush() rejection via .catch(() => {}).
  • [HIGH] [TEST-MISSING-CRITICAL] src/actions/provider-endpoints.ts:708 adds batchGetProviderEndpointProbeLogs without unit coverage.
  • [HIGH] [TEST-MISSING-CRITICAL] src/actions/provider-endpoints.ts:758 adds batchGetVendorTypeEndpointStats without unit coverage.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Codex AI

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Identified open PR: #789 (refactor(provider): improve provider page performance)
  • Applied size label: size/XL (additions 24827, deletions 1176, files 72)
  • Posted 4 inline review comments on diff lines:
    • src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:375 (.catch(() => {}) swallowing flush() rejection)
    • src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx:127 (.catch(() => {}) swallowing flush() rejection)
    • src/actions/provider-endpoints.ts:708 (missing unit tests for batchGetProviderEndpointProbeLogs)
    • src/actions/provider-endpoints.ts:758 (missing unit tests for batchGetVendorTypeEndpointStats)
  • Submitted the required PR review summary via gh pr review --comment

tesgth032 and others added 2 commits February 15, 2026 21:59
* fix: Providers 管理页批量化端点统计与测活日志

* perf: 优化 provider 统计与 my-usage 查询性能

* perf: Providers 管理页移除 refresh 放大器并按需加载端点区块

* fix: 跟进 review 补齐 Providers 批量与统计健壮性

* fix: 跟进 CodeRabbit 修复 in-view 与测活数据校验

* perf: 补齐 in-view 稳定化与 batch 404 复原

* perf: my-usage 配额/汇总减少 DB 往返

* perf(providers): 端点池热路径批量熔断查询与索引迁移 (#779)

- 运行时端点选择与严格审计统计改为批量读取端点熔断状态,减少 Redis 往返\n- probe 写入在端点并发删除时静默忽略,避免 FK 失败导致任务中断\n- 新增索引迁移:idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active\n- repository 批量查询模块改为 server-only,避免误暴露为 Server Action

* fix: 跟进 review 去重熔断 reset 与 scanEnd (#779)

* fix: 精确熔断 reset + repo 使用 server-only (#779)

* fix: my-usage 补齐 sessionId/warmup 过滤 (#779)

* perf: provider 统计 in-flight 去重更稳健 (#779)

* fix: ProviderForm 统一失效相关缓存 (#779)

* fix: Providers/Usage 细节修正与用例补齐 (#779)

* style: biome 格式化补齐 (#779)

* fix(#779): 熔断状态同步与 probeLogs 批量查询改进

* fix(#781): 清理孤儿端点并修正 Endpoint Health

* perf: 优化 usage logs 与端点同步(#779/#781)

* refactor: 移除端点冗余过滤(#779)

* fix: 熔断状态批量查询仅覆盖启用端点(#779)

* fix: Provider 统计兼容脏数据并稳定 probe logs 排序(#779)

* perf: 禁用 Providers 重查询的 window focus 自动刷新(#779)

* fix: 多实例熔断状态定期同步,并修复 backfill 遗留软删除端点(#779/#781)

* perf: probe scheduler 仅探测启用 provider 的端点(#781)

* perf: ProviderForm 避免重复 refetch 并稳定 hover circuit key(#779)

* perf: 全局 QueryClient 策略与 usage/user 索引优化(#779)

* perf: 时区统计索引命中与批量删除优化(#779)

* perf: 降低 logs/users 页面无效重算

* fix(provider): endpoint pool 仅基于启用 provider

- sync/backfill/delete:引用判断与回填仅考虑 is_enabled=true 的 provider,避免 disabled provider 复活旧 endpoint
- updateProvider:provider 从禁用启用时确保端点存在
- Dashboard Endpoint Health:避免并发刷新覆盖用户切换,vendor/type 仅从启用 provider 推导
- probe logs 批量接口:滚动发布场景下部分 404 不全局禁用 batch
- 补齐 endpoint-selector 单测以匹配 findEnabled* 语义

* perf: Dashboard vendor/type 轻量查询与 usage logs 并行查询

* fix(migrate): advisory lock 串行迁移并移除 emoji 日志

* fix: endpoint hover 兜底并规范 batch probe logs SQL

* perf(settings/providers): 减少冗余刷新并复用 endpoint/circuit 缓存

* perf(probe/statistics): 修正 probe 锁/计数并收敛统计与 usage 扫描

* perf(probe/ui): 优化 probe 目标筛选 SQL 并减少 sparkline 闪烁

* fix(db): 修复 Drizzle snapshot 链

* fix(perf): 补强 Providers 批量与缓存一致性

- Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示)

* chore: format code (issue-779-provider-performance-23b338e)

* chore: 触发 CI 重跑

* fix(provider): 批量启用时补齐 endpoint pool

- batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断

* fix(perf): 收敛 Providers 刷新放大并优化探测/分页

* perf: 收敛 availability/probe 轮询并优化 my-usage (#779/#781)

- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈

* fix(ui): 恢复全局 react-query 默认配置

* fix(availability): 刷新 vendors 时清理旧 endpoint 选择

* perf: 补强 Providers 探测与 Usage Logs 性能

* perf(ui): useInViewOnce 共享 IntersectionObserver 降低资源占用

- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)

* perf: providers batch where 优化与 sparkline 降级并发修正

* perf: my-usage breakdown 补齐缓存字段并优化筛选缓存

* perf: 优化端点熔断 Redis 负载与探测候选

* fix(#781): Endpoint Health 仅展示启用 provider 引用端点

* 修正端点健康筛选并增强URL解析容错

* docs(provider-endpoints): 说明 keepPreviousWhenReferenced 语义

* perf(availability): EndpointTab 前后台切换节流刷新

* docs(availability): 补充 EndpointTab 刷新节流注释

* chore(review): 按 AI 审阅补齐注释并收敛细节

* fix: 修正 provider 统计 SQL 的 DST 日界

---------

Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…t issues

Merge 6 index migrations (0068-0073) into single idempotent migration.
Extract reusable utilities from duplicated code across the codebase:

- TTLMap<K,V>: generic LRU+TTL cache replacing 3 inline implementations
- createAbortError: shared abort error factory from 2 components
- startLeaderLockKeepAlive: shared leader lock renewal from 2 schedulers
- ProbeLogsBatcher: data-fetching infra extracted from sparkline component
- buildUsageLogConditions: shared SQL filter builder from 3 query functions

Additional cleanup:
- Simplify useInViewOnce hook (remove unused options, keep shared observer pool)
- Remove dead code (sumKeyTotalCostById, unexport internal types)
- Hardcode env var defaults (ENDPOINT_CIRCUIT_HEALTH_CACHE_MAX_SIZE,
  ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS)
- Fix in-flight dedup race condition in getProviderStatistics
- Fix yesterday/today interval boundary inconsistency (lte -> lt)
- Add NaN guard for limitPerEndpoint in batch probe logs
- Add updatedAt to deleteProvider for audit consistency
- Log swallowed flush() errors in batchers instead of silently catching
@ding113 ding113 force-pushed the refactor/provider-performance branch from 54afd85 to c2c6757 Compare February 15, 2026 15:54
@github-actions
Copy link
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

69 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +97 to 100
loadedFromRedisAt.set(endpointId, now);

try {
const redisState = await loadEndpointCircuitState(endpointId);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamp updated before Redis fetch completes. If Redis call fails or is slow, concurrent calls arriving within the TTL window will skip Redis sync and potentially use stale memory state. Consider setting the timestamp after successful Redis load (similar to the in-flight dedup pattern in provider.ts:1224-1225).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 97:100

Comment:
Timestamp updated before Redis fetch completes. If Redis call fails or is slow, concurrent calls arriving within the TTL window will skip Redis sync and potentially use stale memory state. Consider setting the timestamp after successful Redis load (similar to the in-flight dedup pattern in `provider.ts:1224-1225`).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +30 to +32
if (this.store.size >= this.maxSize) {
this.evict();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size check uses >= but happens after delete on line 28. For updating existing keys, the delete reduces size before the check, so eviction won't trigger when exactly at capacity (e.g., size=100, delete makes 99, check passes, insert makes 100 again). This is correct behavior but could add a clarifying comment.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/cache/ttl-map.ts
Line: 30:32

Comment:
Size check uses `>=` but happens after delete on line 28. For updating existing keys, the delete reduces size before the check, so eviction won't trigger when exactly at capacity (e.g., size=100, delete makes 99, check passes, insert makes 100 again). This is correct behavior but could add a clarifying comment.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/repository/leaderboard.ts (1)

495-495: ⚠️ Potential issue | 🟡 Minor

cacheCreationCost 语义与实际计算不一致。

当前表达式对 cacheCreationInputTokens > 0 的请求求和的是 整个请求的 costUsd,而非仅缓存创建部分的成本。字段命名为 cacheCreationCost 容易让调用方误以为这是缓存创建的专属开销。

如果无法拆分出精确的缓存创建成本,建议将字段重命名为更准确的名称(如 costOfCacheCreatingRequests),或在接口注释中明确说明其含义。

src/app/[locale]/dashboard/users/users-page-client.tsx (1)

182-190: ⚠️ Potential issue | 🟡 Minor

硬编码的英文错误消息违反 i18n 规范。

"Failed to fetch settings" 是硬编码的英文字符串。虽然当前组件未直接展示此错误信息,但如果上层存在 Error Boundary 或后续重构将其暴露给用户,则会导致非英文用户看到未翻译的文本。

建议使用 i18n 键或统一的错误提示:

♻️ 建议修改
-      if (!response.ok) throw new Error("Failed to fetch settings");
+      if (!response.ok) throw new Error(tCommon("error"));

As per coding guidelines, **/*.{ts,tsx,js,jsx}: "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

src/lib/endpoint-circuit-breaker.ts (1)

84-141: ⚠️ Potential issue | 🟠 Major

避免 Redis 同步竞态并确保缓存上限生效

getOrCreateHealth 在 Redis 读取前先写入 loadedFromRedisAt,但未登记 redisSyncInFlight,并发调用会在 Redis 结果尚未返回时跳过同步,可能在 TTL 窗口内误判为 closed;同时 Redis 分支的 early return 会绕过 LRU/容量裁剪。建议用 in-flight 包裹 Redis 读取、将 loadedFromRedisAt 放在读取完成后设置,并统一在函数末尾执行裁剪。

建议修复
 async function getOrCreateHealth(endpointId: number): Promise<EndpointHealth> {
   const inFlight = redisSyncInFlight.get(endpointId);
   if (inFlight) {
     await inFlight;
   }

   let health = healthMap.get(endpointId);
   const loadedAt = loadedFromRedisAt.get(endpointId);
   const now = Date.now();
   const needsRedisCheck =
     loadedAt === undefined || (loadedAt !== undefined && now - loadedAt > REDIS_SYNC_TTL_MS);

+  let resolved: EndpointHealth | undefined;
   if (needsRedisCheck) {
-    loadedFromRedisAt.set(endpointId, now);
-
-    try {
-      const redisState = await loadEndpointCircuitState(endpointId);
-      if (redisState) {
-        // 从 Redis 同步到内存时,不能只在 circuitState 变化时才更新:
-        // failureCount / halfOpenSuccessCount 等字段也可能在其它实例中发生变化。
-        if (health) {
-          health.failureCount = redisState.failureCount;
-          health.lastFailureTime = redisState.lastFailureTime;
-          health.circuitState = redisState.circuitState;
-          health.circuitOpenUntil = redisState.circuitOpenUntil;
-          health.halfOpenSuccessCount = redisState.halfOpenSuccessCount;
-          return health;
-        }
-
-        health = {
-          failureCount: redisState.failureCount,
-          lastFailureTime: redisState.lastFailureTime,
-          circuitState: redisState.circuitState,
-          circuitOpenUntil: redisState.circuitOpenUntil,
-          halfOpenSuccessCount: redisState.halfOpenSuccessCount,
-        };
-        healthMap.set(endpointId, health);
-        return health;
-      }
-
-      if (health && health.circuitState !== "closed") {
-        health.circuitState = "closed";
-        health.failureCount = 0;
-        health.lastFailureTime = null;
-        health.circuitOpenUntil = null;
-        health.halfOpenSuccessCount = 0;
-      }
-    } catch (error) {
-      logger.warn("[EndpointCircuitBreaker] Failed to sync state from Redis", {
-        endpointId,
-        error: error instanceof Error ? error.message : String(error),
-      });
-    }
+    const syncPromise = (async () => {
+      try {
+        const redisState = await loadEndpointCircuitState(endpointId);
+        loadedFromRedisAt.set(endpointId, Date.now());
+
+        if (redisState) {
+          const current = healthMap.get(endpointId);
+          if (current) {
+            current.failureCount = redisState.failureCount;
+            current.lastFailureTime = redisState.lastFailureTime;
+            current.circuitState = redisState.circuitState;
+            current.circuitOpenUntil = redisState.circuitOpenUntil;
+            current.halfOpenSuccessCount = redisState.halfOpenSuccessCount;
+            return current;
+          }
+
+          const nextHealth: EndpointHealth = {
+            failureCount: redisState.failureCount,
+            lastFailureTime: redisState.lastFailureTime,
+            circuitState: redisState.circuitState,
+            circuitOpenUntil: redisState.circuitOpenUntil,
+            halfOpenSuccessCount: redisState.halfOpenSuccessCount,
+          };
+          healthMap.set(endpointId, nextHealth);
+          return nextHealth;
+        }
+
+        const current = healthMap.get(endpointId);
+        if (current && current.circuitState !== "closed") {
+          current.circuitState = "closed";
+          current.failureCount = 0;
+          current.lastFailureTime = null;
+          current.circuitOpenUntil = null;
+          current.halfOpenSuccessCount = 0;
+        }
+        return current;
+      } catch (error) {
+        logger.warn("[EndpointCircuitBreaker] Failed to sync state from Redis", {
+          endpointId,
+          error: error instanceof Error ? error.message : String(error),
+        });
+        return healthMap.get(endpointId);
+      }
+    })();
+
+    redisSyncInFlight.set(endpointId, syncPromise);
+    try {
+      resolved = await syncPromise;
+    } finally {
+      if (redisSyncInFlight.get(endpointId) === syncPromise) {
+        redisSyncInFlight.delete(endpointId);
+      }
+    }
   }

-  const result = getOrCreateHealthSync(endpointId);
+  const result = resolved ?? getOrCreateHealthSync(endpointId);
   enforceEndpointHealthCacheMaxSize();
   return result;
 }
src/app/[locale]/dashboard/availability/_components/endpoint-probe-history.tsx (1)

89-111: ⚠️ Potential issue | 🟡 Minor

fetchLogsfetch 响应未校验 HTTP 状态码。

Line 101-102 直接对响应调用 res.json() 而未检查 res.ok。当 API 返回 4xx/5xx 时,res.json() 可能解析出非预期结构(或抛异常),导致 data.logsundefined 从而静默清空日志。

建议:在 json 解析前校验状态码
      const res = await fetch(`/api/availability/endpoints/probe-logs?${params.toString()}`);
+     if (!res.ok) {
+       console.error("Failed to fetch logs", res.status, res.statusText);
+       return;
+     }
      const data = await res.json();
🤖 Fix all issues with AI agents
In
`@src/app/`[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx:
- Around line 61-137: The finally block can leave loadingVendors stuck true when
a non-silent request A is in-flight, a silent request B increments
vendorsRequestIdRef and cancels A, and neither finally clears loading; to fix,
remember whether this invocation actually set loading (e.g., const didSetLoading
= !options?.silent; call setLoadingVendors(true) only when didSetLoading), and
in finally clear loading when either didSetLoading is true or this request is
still the current one (if (didSetLoading || requestId ===
vendorsRequestIdRef.current) setLoadingVendors(false)); update refreshVendors to
use didSetLoading and reference vendorsRequestIdRef, setLoadingVendors, and
options.silent accordingly.

In `@src/app/`[locale]/settings/providers/_components/provider-endpoint-hover.tsx:
- Around line 432-435: The loading message for the endpoints list reuses
t("keyLoading") which is semantically "key loading"; update the endpointsLoading
branch in provider-endpoint-hover.tsx to use a dedicated translation key (e.g.
t("endpointStatus.loading") or a generic t("loading")) instead of "keyLoading",
and add the corresponding entries to your i18n resource files so the new key is
localized; ensure the JSX still renders the same container (the div with
className "px-3 py-4 text-center text-xs text-muted-foreground") but with the
new translation key.

In `@src/lib/migrate.ts`:
- Around line 39-52: The finally block can let client.end() mask an earlier
error from fn(); wrap the await client.end() call in its own try-catch so that
any error from client.end() is logged (including error details) but does not
override the original exception thrown by fn(); specifically modify the finally
in migrate.ts to catch errors from await client.end() (reference client.end(),
fn(), acquired, lockName) and log them via logger.error while allowing the
original error to propagate.
🧹 Nitpick comments (25)
scripts/validate-migrations.js (1)

163-173: entry?.whentypeof 校验可考虑同时检查 idx 的连续性。

当前仅校验 when 的单调性,但如果 idx 出现跳跃或重复(例如手动编辑导致的笔误),同样可能引发迁移问题。可以考虑额外校验 idx === expectedIdx,与单调性检查互补。

可选:增加 idx 连续性校验
   const issues = [];
   let previousWhen = Number.NEGATIVE_INFINITY;
   let previousTag = "";
+  let expectedIdx = 0;

   for (const entry of journal.entries) {
     const tag = typeof entry?.tag === "string" ? entry.tag : "(unknown)";
     const when = entry?.when;
+
+    if (entry?.idx !== expectedIdx) {
+      issues.push({
+        type: "JOURNAL",
+        line: 0,
+        statement: `Non-sequential idx: expected ${expectedIdx}, got ${entry?.idx} for tag=${tag}`,
+        suggestion: "Ensure journal entries have sequential idx values starting from 0.",
+      });
+    }
+    expectedIdx++;
+
     if (typeof when !== "number" || !Number.isFinite(when)) {
src/app/[locale]/settings/providers/_components/provider-rich-list-item.tsx (3)

713-722: 桌面端 vendor 条件渲染逻辑正确,但移动端缺少对应处理。

桌面端新增了 vendor 分支展示(displayName / websiteDomain + ProviderEndpointHover),回退到 provider.url 也合理。但移动端(lines 442-565)完全没有 vendor 相关展示。如果这是有意为之(移动端精简信息),建议加个简短注释说明;否则建议补齐移动端的 vendor 信息展示以保持一致性。


656-665: vendor 存在时,考虑优先使用 vendor.faviconUrl

当前 favicon 始终从 provider.faviconUrl 读取,但既然已引入 vendor prop 且 ProviderVendor 类型也包含 faviconUrl 字段,建议在 vendor 存在时优先使用 vendor.faviconUrl,避免 provider 上的冗余字段与 vendor 数据不一致。

建议修改
-            {provider.faviconUrl && (
+            {(vendor?.faviconUrl ?? provider.faviconUrl) && (
               <img
-                src={provider.faviconUrl}
+                src={vendor?.faviconUrl ?? provider.faviconUrl}
                 alt=""
                 className="h-4 w-4 flex-shrink-0"
                 onError={(e) => {
                   (e.target as HTMLImageElement).style.display = "none";
                 }}
               />
             )}

209-215: 查询失效的 key 集合可考虑统一管理。

组件内多处调用 queryClient.invalidateQueries 使用了不同的 key 组合(例如删除时额外失效 ["provider-vendors"],而编辑/切换时只失效 ["providers"]["providers-health"])。建议将常用的失效 key 组合抽取为常量或辅助函数,减少遗漏风险并提高可维护性。

示例
// 可在组件内或外部定义
const invalidateProviderQueries = (queryClient: QueryClient, includeVendors = false) => {
  queryClient.invalidateQueries({ queryKey: ["providers"] });
  queryClient.invalidateQueries({ queryKey: ["providers-health"] });
  if (includeVendors) {
    queryClient.invalidateQueries({ queryKey: ["provider-vendors"] });
  }
};
src/app/[locale]/dashboard/users/users-page-client.tsx (1)

44-46: UsersPageClient 现在只是一个透传包装器,可考虑简化。

移除 QueryClientProvider 后,UsersPageClient 仅原样转发 propsUsersPageContent。可以直接导出 UsersPageContent 并重命名为 UsersPageClient,减少不必要的组件层级和调用栈深度。

♻️ 建议的简化方案
-export function UsersPageClient(props: UsersPageClientProps) {
-  return <UsersPageContent {...props} />;
-}
-
-function UsersPageContent({ currentUser }: UsersPageClientProps) {
+export function UsersPageClient({ currentUser }: UsersPageClientProps) {
src/repository/overview.ts (1)

97-104: 时区边界逻辑与 getOverviewMetrics 重复,建议提取公共辅助函数。

nowLocal / todayStartLocal / todayStart / tomorrowStart 的 SQL 片段构造在两个函数中完全一致(Lines 45-48 vs 97-100)。可以抽取一个内部辅助(如 buildDayBoundaries(timezone: string))返回这些 SQL 片段,减少重复并确保两处行为始终同步。

示例重构
function buildDayBoundaries(timezone: string) {
  const nowLocal = sql`CURRENT_TIMESTAMP AT TIME ZONE ${timezone}`;
  const todayStartLocal = sql`DATE_TRUNC('day', ${nowLocal})`;
  const todayStart = sql`(${todayStartLocal} AT TIME ZONE ${timezone})`;
  const tomorrowStart = sql`((${todayStartLocal} + INTERVAL '1 day') AT TIME ZONE ${timezone})`;
  return { nowLocal, todayStartLocal, todayStart, tomorrowStart };
}

getOverviewMetricsWithComparison 中可继续基于返回值派生 yesterdayStart / yesterdayEnd

src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (3)

357-364: endpointIds 在 queryKey 和 queryFn 中重复排序。

endpointIdsKey(Line 360-363)已对 ID 排序生成缓存键,queryFn 内(Line 372)又做了一次相同的排序。可以将排序后的数组提取为共享变量,或直接在 queryFn 中复用 endpointIdsKey 解析后的结果。影响不大,仅是冗余计算。


386-388: circuitState 使用了未经校验的类型断言。

item.circuitState as EndpointCircuitState 直接断言服务端返回值类型。如果后端返回了意外值(如 null 或新增状态),UI 层不会有任何防御。建议添加简单的运行时校验或使用 fallback:

建议修改
  for (const item of results.flat()) {
-   map[item.endpointId] = item.circuitState as EndpointCircuitState;
+   const state = item.circuitState;
+   if (state === "closed" || state === "open" || state === "half-open") {
+     map[item.endpointId] = state;
+   }
  }

242-289: 降级路径中的中文注释建议统一为英文。

Line 243 的注释使用了中文。虽然编码指南仅要求用户可见字符串使用 i18n,但建议代码注释保持与项目其他部分一致的语言风格,以利于团队协作。

src/lib/abort-utils.ts (1)

1-9: signal.reason 的 falsy 检查可能遗漏合法的 falsy reason 值。

signal.reasonAbortController.abort(reason) 中可以是任意值。当前的 if (signal.reason) 会在 reason 为 0""false 时跳过,改用 !== undefined 更为严谨。实际场景下影响极小,仅作提醒。

建议修改
 export function createAbortError(signal?: AbortSignal): unknown {
   if (!signal) return new Error("Aborted");
-  if (signal.reason) return signal.reason;
+  if (signal.reason !== undefined) return signal.reason;
 
   try {
     return new DOMException("Aborted", "AbortError");
   } catch {
     return new Error("Aborted");
   }
 }
src/lib/provider-endpoints/leader-lock.ts (1)

143-196: startLeaderLockKeepAlive 实现整体合理,有一个关于 onLost 重复调用的小问题值得关注。

getLock() 返回 undefined(Line 164)时,会调用 stop() + opts.onLost()。但此时 opts.clearLock() 并未被调用,这与续约失败路径(Line 174 先调 clearLock()onLost())的行为不一致。虽然在 getLock() 返回 undefined 的场景下,锁本身已经不存在,不调 clearLock() 也说得通,但建议确认调用方的 onLost 回调是幂等的,以防在极端时序下(比如 clearLock 之后紧跟一次 tick)被连续触发两次。

src/lib/cache/ttl-map.ts (1)

53-55: size 属性包含已过期但未清理的条目,可能导致外部调用方产生误判。

由于采用惰性过期策略,size 返回的是底层 Map 的实际大小,可能包含已过期但尚未被 get/has/evict 清除的条目。如果调用方依赖 size 做精确容量判断,可能会偏大。对于纯缓存用途这是可接受的 trade-off,但建议在类上方或 size 访问器处加一行注释说明此行为。

src/drizzle/schema.ts (1)

323-329: providersEnabledVendorTypeIdx 的 WHERE 条件中 IS NOT NULL 检查冗余。

providerVendorId 列在 schema 中定义为 .notNull()(Line 160-161),因此 ${table.providerVendorId} IS NOT NULL 永远为 true,> 0 已经隐含排除了 NULL。不影响正确性和性能,但可以精简 WHERE 表达式。

src/lib/provider-endpoints/probe-logs-batcher.ts (2)

222-250: 并发控制模式正确但不够直观,可考虑注释说明其线程安全性。

idx++ 在多个 async worker 间共享,由于 JS 单线程模型,idx++await 之间的同步执行是安全的。但此模式容易让读者误以为存在竞态条件,建议添加简短注释说明。


354-384: flush 中外层 catch 可能对已 resolve 的请求再次调用 reject。

Promise.all 内部每个 async 函数已有 try/catch,正常情况下外层 catch 不会触发。但如果触发,它会遍历 snapshot 中所有 group(包括已 resolve 的请求)调用 req.reject。虽然 load 中的 settled 闭包保证了不会重复 settle promise,行为上是安全的,但理解成本较高——建议在外层 catch 处添加注释说明此安全保障。

src/lib/hooks/use-in-view-once.ts (1)

12-22: getObserverOptionsKey 未将 root 纳入缓存键。

当前 getSharedObserver 是模块私有的,且 useInViewOnce 固定使用 DEFAULT_OPTIONSrootundefined/viewport),所以暂时安全。但如果未来扩展为支持自定义 root 元素,不同 root 会被错误地共享同一个 IntersectionObserver,导致回调不触发。

可考虑在函数顶部加一行注释标明此限制。

src/repository/statistics.ts (1)

1003-1016: QuotaCostSummary 未导出,而 QuotaCostRanges 已导出。

QuotaCostSummary 作为 sumUserQuotaCostssumKeyQuotaCostsById 的返回类型,外部调用方如需声明变量类型则无法直接引用。建议一并导出。

建议修改
-interface QuotaCostSummary {
+export interface QuotaCostSummary {
src/instrumentation.ts (1)

402-442: 开发模式下 backfill 代码与生产模式重复约 30 行。

生产模式(lines 291-321)和开发模式(lines 414-442)的 backfill 逻辑几乎完全相同,区别仅在于是否包裹 advisory lock。可考虑提取公共的 runBackfills() 函数,在生产模式下通过 withAdvisoryLock(lockName, runBackfills, { skipIfLocked: true }) 调用,在开发模式下直接调用 runBackfills()

src/actions/my-usage.ts (2)

381-394: getMyTodayStats 中手动累加 breakdown 行的方式可行,但缺少对 inputTokens / outputTokensNumber.isFinite 保护。

Line 389 对 costUsd 做了 Number.isFinite 校验,但 inputTokensoutputTokens 同样来自 sql<number> 类型标注——Drizzle 对 double precision 的映射在极端情况下(如 NaN/Infinity)可能返回非有限值。建议保持一致性,对 token 字段也做相同的防御性检查,或者至少确认上游 SQL 的 COALESCE 已保证不会出现非有限值。

可选:对 token 字段也做 isFinite 防护
-      totalCalls += row.calls ?? 0;
-      totalInputTokens += row.inputTokens ?? 0;
-      totalOutputTokens += row.outputTokens ?? 0;
-      totalCostUsd += costUsd;
+      totalCalls += row.calls ?? 0;
+      const safeInput = Number.isFinite(row.inputTokens) ? row.inputTokens : 0;
+      const safeOutput = Number.isFinite(row.outputTokens) ? row.outputTokens : 0;
+      totalInputTokens += safeInput;
+      totalOutputTokens += safeOutput;
+      totalCostUsd += costUsd;

218-222: 动态 import() 用于 time-utilsstatistics 模块——确认是有意为之。

getMyQuota 中对 @/lib/rate-limit/time-utils@/repository/statistics 使用了动态 import(),这会在每次调用时产生模块解析开销(虽然 Node 会缓存)。如果这是为了避免循环依赖或减少冷启动加载,可以保留;否则改为顶层静态 import 更清晰且利于 tree-shaking。

src/app/[locale]/dashboard/availability/_components/endpoint-probe-history.tsx (1)

54-64: 初始加载 vendors 时,若请求失败仅 console.error,用户无任何反馈。

可以考虑加一个 toast 提示或将 error 状态暴露到 UI 上,但鉴于此页面是管理后台、且这是初始化加载,当前处理方式也可接受。

src/repository/usage-logs.ts (4)

330-364: findUsageLogsForKeySlim:缓存命中与未命中路径存在重复的 map 逻辑。

Line 339 和 line 358-360 两处对 pageRows 做了相同的 costUsd?.toString() ?? null 映射。如果映射逻辑日后变更(如增加字段转换),需要同步修改两处,容易遗漏。

建议:提取映射函数统一处理
+  const mapSlimRow = (row: typeof pageRows[number]): UsageLogSlimRow => ({
+    ...row,
+    costUsd: row.costUsd?.toString() ?? null,
+  });
+
   const cachedTotal = usageLogSlimTotalCache.get(totalCacheKey);
   if (cachedTotal !== undefined) {
     total = Math.max(cachedTotal, total);
     return {
-      logs: pageRows.map((row) => ({ ...row, costUsd: row.costUsd?.toString() ?? null })),
+      logs: pageRows.map(mapSlimRow),
       total,
     };
   }

   // ... COUNT logic ...

-  const logs: UsageLogSlimRow[] = pageRows.map((row) => ({
-    ...row,
-    costUsd: row.costUsd?.toString() ?? null,
-  }));
+  const logs: UsageLogSlimRow[] = pageRows.map(mapSlimRow);

   usageLogSlimTotalCache.set(totalCacheKey, total);
   return { logs, total };

464-498: findUsageLogsWithDetails:两个 summary 查询分支的 select 完全相同,仅 from/join 不同,建议提取公共 select 定义。

Lines 467-479 与 lines 483-494 的选择列完全一致,重复了约 30 行 SQL 模板。如果后续新增统计字段(如已新增的 5m/1h tokens),两处都需同步修改。

建议:提取公共 select 对象
+  const summarySelectFields = {
+    totalRows: sql<number>`count(*)::double precision`,
+    totalRequests: sql<number>`count(*) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision`,
+    totalCost: sql<string>`COALESCE(sum(${messageRequest.costUsd}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION}), 0)`,
+    totalInputTokens: sql<number>`COALESCE(sum(${messageRequest.inputTokens}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision, 0::double precision)`,
+    totalOutputTokens: sql<number>`COALESCE(sum(${messageRequest.outputTokens}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision, 0::double precision)`,
+    totalCacheCreationTokens: sql<number>`COALESCE(sum(${messageRequest.cacheCreationInputTokens}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision, 0::double precision)`,
+    totalCacheReadTokens: sql<number>`COALESCE(sum(${messageRequest.cacheReadInputTokens}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision, 0::double precision)`,
+    totalCacheCreation5mTokens: sql<number>`COALESCE(sum(${messageRequest.cacheCreation5mInputTokens}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision, 0::double precision)`,
+    totalCacheCreation1hTokens: sql<number>`COALESCE(sum(${messageRequest.cacheCreation1hInputTokens}) FILTER (WHERE ${EXCLUDE_WARMUP_CONDITION})::double precision, 0::double precision)`,
+  };
+
   const summaryQuery =
     keyId === undefined
-      ? db
-          .select({
-            totalRows: sql<number>`count(*)::double precision`,
-            // ... (all fields) ...
-          })
-          .from(messageRequest)
-          .where(and(...conditions))
-      : db
-          .select({
-            totalRows: sql<number>`count(*)::double precision`,
-            // ... (same fields) ...
-          })
-          .from(messageRequest)
-          .innerJoin(keysTable, eq(messageRequest.key, keysTable.key))
-          .where(and(...conditions));
+      ? db.select(summarySelectFields).from(messageRequest).where(and(...conditions))
+      : db
+          .select(summarySelectFields)
+          .from(messageRequest)
+          .innerJoin(keysTable, eq(messageRequest.key, keysTable.key))
+          .where(and(...conditions));

274-276: usageLogSlimTotalCache 的 key 构造使用 \u0001 分隔符,需确保各字段值本身不含此字符。

Line 291-301 用 \u0001(SOH 控制字符)拼接缓存 key。对于来源为用户输入的 sessionIdmodelendpoint 等字段,理论上可能包含任意字符。不过在实际场景中 SOH 出现在这些字段的概率极低,且该缓存仅用于 10 秒短 TTL 的 total count,冲突影响有限。


367-374: distinct models/endpoints 缓存 TTL 为 5 分钟——对于 my-usage 场景可接受,但新增的 model/endpoint 可能有延迟可见性。

如果用户刚通过新 model 发起请求后立即查看筛选器下拉列表,可能需等待最长 5 分钟才能看到新选项。这对 "我的用量" 页面而言影响不大,但值得在文档或注释中注明。

Comment on lines 61 to 137
const refreshVendors = useCallback(async (options?: { silent?: boolean }) => {
const requestId = ++vendorsRequestIdRef.current;
if (!options?.silent) {
setLoadingVendors(true);
}

try {
const currentVendorId = latestSelectionRef.current.vendorId;
const currentType = latestSelectionRef.current.providerType;
const nextVendors = await getDashboardProviderVendors();

if (requestId !== vendorsRequestIdRef.current) {
return null;
}
};
fetchVendors();
}, []);

// Fetch endpoints when vendor or type changes
useEffect(() => {
if (!selectedVendorId || !selectedType) {
setEndpoints([]);
return;
setVendors(nextVendors);

if (nextVendors.length === 0) {
setSelectedVendorId(null);
setSelectedType(null);
setSelectedEndpoint(null);
return {
selectionChanged: currentVendorId != null || currentType != null,
vendorId: null,
providerType: null,
};
}

const vendor =
(currentVendorId ? nextVendors.find((v) => v.id === currentVendorId) : null) ??
nextVendors[0] ??
null;

if (!vendor) {
setSelectedVendorId(null);
setSelectedType(null);
setSelectedEndpoint(null);
return {
selectionChanged: currentVendorId != null || currentType != null,
vendorId: null,
providerType: null,
};
}

const nextVendorId = vendor.id;
const nextProviderType =
currentType && vendor.providerTypes.includes(currentType)
? currentType
: (vendor.providerTypes[0] ?? null);

const selectionChanged = nextVendorId !== currentVendorId || nextProviderType !== currentType;

if (selectionChanged) {
// 避免 selection 自动切换期间仍能对旧 endpoint 发起探测请求(#781)。
setSelectedEndpoint(null);
}

setSelectedVendorId(nextVendorId);
setSelectedType(nextProviderType);

return {
selectionChanged,
vendorId: nextVendorId,
providerType: nextProviderType,
};
} catch (error) {
if (requestId !== vendorsRequestIdRef.current) {
return null;
}
console.error("Failed to fetch vendors:", error);
return null;
} finally {
if (!options?.silent && requestId === vendorsRequestIdRef.current) {
setLoadingVendors(false);
}
}
}, []);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

refreshVendorssilent 模式的 loading 状态管理需注意边界情况。

Line 133: if (!options?.silent && requestId === vendorsRequestIdRef.current) — 当使用 silent: true 调用时,setLoadingVendors(false) 不会在 finally 中执行。这依赖于 silent 调用时 setLoadingVendors(true) 也不会被调用(line 63-65),所以 loading 状态保持一致。逻辑是正确的,但如果后续有非 silent 请求被 silent 请求的 requestId 递增所作废,finally 中 requestId === vendorsRequestIdRef.current 为 false,loading 状态可能卡在 true。

具体场景:非 silent 请求 A 正在进行 → silent 请求 B 发起 → A 的 finally 发现 requestId 不匹配,不调用 setLoadingVendors(false) → B 的 finally 因为 silent 也不调用 → loadingVendors 卡在 true。

建议的修复方案
    } finally {
-     if (!options?.silent && requestId === vendorsRequestIdRef.current) {
+     if (requestId === vendorsRequestIdRef.current) {
        setLoadingVendors(false);
      }
    }

或者在 silent 模式下也确保 loading 状态被正确重置。

🤖 Prompt for AI Agents
In
`@src/app/`[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx
around lines 61 - 137, The finally block can leave loadingVendors stuck true
when a non-silent request A is in-flight, a silent request B increments
vendorsRequestIdRef and cancels A, and neither finally clears loading; to fix,
remember whether this invocation actually set loading (e.g., const didSetLoading
= !options?.silent; call setLoadingVendors(true) only when didSetLoading), and
in finally clear loading when either didSetLoading is true or this request is
still the current one (if (didSetLoading || requestId ===
vendorsRequestIdRef.current) setLoadingVendors(false)); update refreshVendors to
use didSetLoading and reference vendorsRequestIdRef, setLoadingVendors, and
options.silent accordingly.

Comment on lines +432 to +435
{endpointsLoading ? (
<div className="px-3 py-4 text-center text-xs text-muted-foreground">
{t("keyLoading")}
</div>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

加载中状态复用了 keyLoading 翻译键。

t("keyLoading") 语义上指的是"密钥加载中",此处用于端点列表加载的场景不太匹配。建议使用更通用或专属的翻译键(如 endpointStatus.loading),避免不同上下文共用同一文案导致后续维护混乱。

🤖 Prompt for AI Agents
In `@src/app/`[locale]/settings/providers/_components/provider-endpoint-hover.tsx
around lines 432 - 435, The loading message for the endpoints list reuses
t("keyLoading") which is semantically "key loading"; update the endpointsLoading
branch in provider-endpoint-hover.tsx to use a dedicated translation key (e.g.
t("endpointStatus.loading") or a generic t("loading")) instead of "keyLoading",
and add the corresponding entries to your i18n resource files so the new key is
localized; ensure the JSX still renders the same container (the div with
className "px-3 py-4 text-center text-xs text-muted-foreground") but with the
new translation key.

Remove silent option guard so vendor loading state always resets when
the request completes, preventing stale loading indicators. Wrap
advisory lock client.end() in try-catch to avoid unhandled errors
during connection teardown.
@ding113 ding113 merged commit de634d9 into dev Feb 15, 2026
8 of 9 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 15, 2026
@github-actions
Copy link
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx (1)

409-413: ⚠️ Potential issue | 🟡 Minor

Provider type 名称未经 i18n 处理,直接显示原始字符串

{type} 直接渲染如 "claude""gemini-cli" 等原始值作为下拉选项文本。根据编码规范,所有用户可见的字符串都必须使用 i18n。建议通过翻译 key 映射显示名称:

- {type}
+ {t(`providerTypes.${type}`)}

As per coding guidelines, "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

🤖 Fix all issues with AI agents
In `@src/lib/migrate.ts`:
- Around line 176-185: The finally block in runMigrations calls
migrationClient.end() without try-catch, which can mask or replace earlier
errors; wrap the await migrationClient.end() call in a try-catch (similar to
withAdvisoryLock) so any errors from closing the client are logged via
logger.error but do not override the original exception from fn(); update the
finally in runMigrations to catch and log errors from migrationClient.end()
while preserving original error flow.
🧹 Nitpick comments (4)
src/lib/migrate.ts (2)

130-136: 可考虑将逐条 UPDATE 改为批量操作或包裹在事务中。

当前逐条执行 UPDATE,若进程中途崩溃会导致部分修复。虽然该操作是幂等的(重启后可继续修复剩余行),但使用事务可保证原子性,也能减少多次往返的开销。

建议修改
-  for (const fix of pendingFixes) {
-    await client`
-      UPDATE "drizzle"."__drizzle_migrations"
-      SET created_at = ${fix.to}
-      WHERE id = ${fix.id}
-    `;
-  }
+  await client.begin(async (tx) => {
+    for (const fix of pendingFixes) {
+      await tx`
+        UPDATE "drizzle"."__drizzle_migrations"
+        SET created_at = ${fix.to}
+        WHERE id = ${fix.id}
+      `;
+    }
+  });

147-186: runMigrations 重复了 withAdvisoryLock 的锁获取/释放逻辑,违反 DRY 原则。

withAdvisoryLock 已经封装了完善的 advisory lock 生命周期管理(获取、释放、客户端关闭、错误处理),但 runMigrations 在第 159-161 行和 177-181 行重新实现了相同的模式。主要障碍是 runMigrations 需要用同一个 client 同时执行锁操作和迁移,而 withAdvisoryLock 内部创建了独立的客户端。

可考虑重构 withAdvisoryLock 使其支持接收外部客户端,或者将 runMigrations 改为调用 withAdvisoryLock,在回调中创建迁移连接。

src/app/[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx (2)

57-59: 渲染阶段直接写 ref 存在并发模式下的隐患

在 render 函数体中直接赋值 latestSelectionRef.current 属于渲染阶段副作用。在 React 19 并发模式下,render 可能被调用多次但只有一次会被提交,中间丢弃的 render 也会写入 ref,导致 ref 值短暂地与实际 committed state 不一致。虽然由于最终提交的 render 会覆盖 ref 值,实际影响有限,但更安全的做法是将赋值移入 useEffect

建议的修改
- latestSelectionRef.current.vendorId = selectedVendorId;
- latestSelectionRef.current.providerType = selectedType;
- latestSelectionRef.current.endpointId = selectedEndpoint?.id ?? null;
+ useEffect(() => {
+   latestSelectionRef.current.vendorId = selectedVendorId;
+   latestSelectionRef.current.providerType = selectedType;
+   latestSelectionRef.current.endpointId = selectedEndpoint?.id ?? null;
+ }, [selectedVendorId, selectedType, selectedEndpoint?.id]);

260-277: vendorResultnull(请求失败)时仍会触发 endpoints 和 logs 刷新

refreshVendors 返回 null(网络异常或请求被取代),vendorResult?.selectionChangedundefined!undefined === true,导致后续 refreshEndpointsrefreshProbeLogs 仍会执行。如果 vendor 请求失败,用旧的 selection 继续刷新 endpoints/logs 是可以接受的容错行为,但建议明确处理 null case 以表达意图:

建议的修改
     const refresh = async () => {
       const vendorResult = await refreshVendors({ silent: true });
 
-     const vendorId = vendorResult?.vendorId ?? latestSelectionRef.current.vendorId;
-     const providerType = vendorResult?.providerType ?? latestSelectionRef.current.providerType;
-     const endpointId = latestSelectionRef.current.endpointId;
+     if (!vendorResult) return;
 
-     if (!vendorResult?.selectionChanged && vendorId && providerType) {
+     const vendorId = vendorResult.vendorId ?? latestSelectionRef.current.vendorId;
+     const providerType = vendorResult.providerType ?? latestSelectionRef.current.providerType;
+     const endpointId = latestSelectionRef.current.endpointId;
+
+     if (!vendorResult.selectionChanged && vendorId && providerType) {
         void refreshEndpoints({
           vendorId,
           providerType,
           keepSelectedEndpointId: endpointId,
         });
       }
 
-     if (!vendorResult?.selectionChanged && endpointId) {
+     if (!vendorResult.selectionChanged && endpointId) {
         void refreshProbeLogs(endpointId);
       }
     };

Comment on lines 176 to 185
} finally {
try {
await migrationClient`SELECT pg_advisory_unlock(hashtext(${MIGRATION_ADVISORY_LOCK_NAME}))`;
} catch (unlockError) {
logger.error("Failed to release database migration lock", unlockError);
}

// 关闭连接
await migrationClient.end();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

migrationClient.end() 未用 try-catch 包裹,与 withAdvisoryLock 中的处理方式不一致。

withAdvisoryLock 中已正确将 client.end() 包裹在 try-catch 中(第 51-58 行),但 runMigrations 的 finally 块中第 184 行的 migrationClient.end() 仍然没有保护。如果 end() 抛出异常,会覆盖 fn() 的原始错误或导致未处理的异常。

建议修改
     // 关闭连接
-    await migrationClient.end();
+    try {
+      await migrationClient.end();
+    } catch (endError) {
+      logger.error("Failed to close migration client", endError);
+    }
   }
🤖 Prompt for AI Agents
In `@src/lib/migrate.ts` around lines 176 - 185, The finally block in
runMigrations calls migrationClient.end() without try-catch, which can mask or
replace earlier errors; wrap the await migrationClient.end() call in a try-catch
(similar to withAdvisoryLock) so any errors from closing the client are logged
via logger.error but do not override the original exception from fn(); update
the finally in runMigrations to catch and log errors from migrationClient.end()
while preserving original error flow.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

69 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +200 to +244
if (toLoad.length > 0) {
const promise = (async () => {
try {
const redisStates = await loadEndpointCircuitStates(toLoad);

for (const endpointId of toLoad) {
const redisState = redisStates.get(endpointId) ?? null;
loadedFromRedisAt.set(endpointId, refreshNow);

const health = getOrCreateHealthSync(endpointId);
if (redisState) {
// 从 Redis 同步到内存时,不能只在 circuitState 变化时才更新:
// failureCount / halfOpenSuccessCount 等字段在 forceRefresh 下也应保持一致。
health.failureCount = redisState.failureCount;
health.lastFailureTime = redisState.lastFailureTime;
health.circuitState = redisState.circuitState;
health.circuitOpenUntil = redisState.circuitOpenUntil;
health.halfOpenSuccessCount = redisState.halfOpenSuccessCount;
continue;
}

if (health.circuitState !== "closed") {
health.circuitState = "closed";
health.failureCount = 0;
health.lastFailureTime = null;
health.circuitOpenUntil = null;
health.halfOpenSuccessCount = 0;
}
}
} catch (error) {
logger.warn("[EndpointCircuitBreaker] Failed to batch sync state from Redis", {
count: toLoad.length,
error: error instanceof Error ? error.message : String(error),
});
}
})().finally(() => {
for (const endpointId of toLoad) {
if (redisSyncInFlight.get(endpointId) === promise) {
redisSyncInFlight.delete(endpointId);
}
}
});

for (const endpointId of toLoad) {
redisSyncInFlight.set(endpointId, promise);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch in-flight dedup race window

The async IIFE at line 201 starts executing immediately (up to the first await at line 203), but redisSyncInFlight.set(endpointId, promise) doesn't happen until lines 243-244. In the current single-threaded JS model, this is safe because the IIFE suspends at await loadEndpointCircuitStates(toLoad) before lines 243-244 run synchronously. However, this ordering is fragile — if any synchronous logic is added before the first await inside the IIFE (e.g., a cache check that returns early), the redisSyncInFlight entries would never be set.

Consider moving lines 243-244 before the IIFE creation (mirroring the fix applied to providerStatisticsInFlight at provider.ts:1224-1225) to make the intent explicit and avoid future regressions:

for (const endpointId of toLoad) {
  redisSyncInFlight.set(endpointId, promise);
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 200:244

Comment:
**Batch in-flight dedup race window**

The async IIFE at line 201 starts executing immediately (up to the first `await` at line 203), but `redisSyncInFlight.set(endpointId, promise)` doesn't happen until lines 243-244. In the current single-threaded JS model, this is safe because the IIFE suspends at `await loadEndpointCircuitStates(toLoad)` before lines 243-244 run synchronously. However, this ordering is fragile — if any synchronous logic is added before the first `await` inside the IIFE (e.g., a cache check that returns early), the `redisSyncInFlight` entries would never be set.

Consider moving lines 243-244 before the IIFE creation (mirroring the fix applied to `providerStatisticsInFlight` at `provider.ts:1224-1225`) to make the intent explicit and avoid future regressions:

```
for (const endpointId of toLoad) {
  redisSyncInFlight.set(endpointId, promise);
}
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +33 to +60
await client`SELECT pg_advisory_lock(hashtext(${lockName}))`;
acquired = true;
}

const result = await fn();
return { ran: true, result };
} finally {
if (acquired) {
try {
await client`SELECT pg_advisory_unlock(hashtext(${lockName}))`;
} catch (unlockError) {
logger.error("Failed to release advisory lock", {
lockName,
error: unlockError instanceof Error ? unlockError.message : String(unlockError),
});
}
}

try {
await client.end();
} catch (endError) {
logger.error("Failed to close advisory lock client", {
lockName,
error: endError instanceof Error ? endError.message : String(endError),
});
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advisory lock client leak on fn() exception

If fn() throws, the finally block calls client.end(). However, pg_advisory_unlock is session-level — if client.end() closes the connection before the unlock executes (or the unlock itself fails), the lock is released implicitly by PostgreSQL when the connection terminates. This is fine for correctness.

However, the client is created with postgres(process.env.DSN, { max: 1 }) which creates a connection pool. If fn() throws synchronously before any query, the pool may not have established a connection yet, and client.end() can throw "Called end with pending queries" depending on the postgres.js version. Consider wrapping the client.end() in a try/catch (which is already done — good), but also note that pg_advisory_unlock on a failed/closed connection will throw and be caught by the inner try/catch, so this is handled correctly.

One real concern: if fn() itself uses the application's shared db connection pool (which it does — all repository functions use the shared drizzle db), the advisory lock is held on the separate client connection. This means the lock protects against concurrent instances but the actual work runs on a different connection. This is correct for the advisory lock pattern — just flagging that the lock does NOT provide transactional isolation with the work done inside fn().

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/migrate.ts
Line: 33:60

Comment:
**Advisory lock client leak on `fn()` exception**

If `fn()` throws, the `finally` block calls `client.end()`. However, `pg_advisory_unlock` is session-level — if `client.end()` closes the connection before the unlock executes (or the unlock itself fails), the lock is released implicitly by PostgreSQL when the connection terminates. This is fine for correctness.

However, the `client` is created with `postgres(process.env.DSN, { max: 1 })` which creates a connection pool. If `fn()` throws synchronously before any query, the pool may not have established a connection yet, and `client.end()` can throw `"Called end with pending queries"` depending on the postgres.js version. Consider wrapping the `client.end()` in a try/catch (which is already done — good), but also note that `pg_advisory_unlock` on a failed/closed connection will throw and be caught by the inner try/catch, so this is handled correctly. 

One real concern: if `fn()` itself uses the **application's shared `db` connection pool** (which it does — all repository functions use the shared drizzle `db`), the advisory lock is held on the **separate `client`** connection. This means the lock protects against concurrent instances but the actual work runs on a different connection. This is correct for the advisory lock pattern — just flagging that the lock does NOT provide transactional isolation with the work done inside `fn()`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +330 to +342
const hasMore = results.length > safePageSize;
const pageRows = hasMore ? results.slice(0, safePageSize) : results;

let total = offset + pageRows.length;

const cachedTotal = usageLogSlimTotalCache.get(totalCacheKey);
if (cachedTotal !== undefined) {
total = Math.max(cachedTotal, total);
return {
logs: pageRows.map((row) => ({ ...row, costUsd: row.costUsd?.toString() ?? null })),
total,
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale total count returned from cache

When the cache hit occurs (line 335-341), the function returns early with the cached total without running a fresh COUNT query. The Math.max(cachedTotal, total) on line 337 biases toward the larger value, which means:

  1. If rows are deleted between requests, the total count will remain inflated until the 10s TTL expires.
  2. On page 1 with few results (offset=0, hasMore=false), total = pageRows.length could be less than cachedTotal, causing the UI to show phantom pages.

This is likely acceptable for a 10-second TTL cache in a usage-logs context, but worth noting that the monotonically-increasing assumption (Math.max) doesn't hold when rows can be soft-deleted between requests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/usage-logs.ts
Line: 330:342

Comment:
**Stale total count returned from cache**

When the cache hit occurs (line 335-341), the function returns early with the cached total without running a fresh COUNT query. The `Math.max(cachedTotal, total)` on line 337 biases toward the larger value, which means:

1. If rows are **deleted** between requests, the total count will remain inflated until the 10s TTL expires.
2. On page 1 with few results (`offset=0`, `hasMore=false`), `total = pageRows.length` could be less than `cachedTotal`, causing the UI to show phantom pages.

This is likely acceptable for a 10-second TTL cache in a usage-logs context, but worth noting that the monotonically-increasing assumption (`Math.max`) doesn't hold when rows can be soft-deleted between requests.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +228 to +233
let idx = 0;

const workers = Array.from({ length: Math.min(concurrency, endpointIds.length) }, async () => {
for (;;) {
const currentIndex = idx++;
if (currentIndex >= endpointIds.length) return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-atomic idx++ in concurrent workers

The idx++ on line 232 is used as a work-stealing index across concurrent async workers. While this is safe in single-threaded JavaScript (the increment + read is not interrupted between the for(;;) iterations since each worker awaits between iterations), this pattern can be confusing to readers who might expect race conditions.

This is functionally correct because idx++ is atomic in the JS event loop — each worker reads and increments idx synchronously before yielding at await. Just noting this is an intentional pattern.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/provider-endpoints/probe-logs-batcher.ts
Line: 228:233

Comment:
**Non-atomic `idx++` in concurrent workers**

The `idx++` on line 232 is used as a work-stealing index across concurrent async workers. While this is safe in single-threaded JavaScript (the increment + read is not interrupted between the `for(;;)` iterations since each worker `await`s between iterations), this pattern can be confusing to readers who might expect race conditions.

This is functionally correct because `idx++` is atomic in the JS event loop — each worker reads and increments `idx` synchronously before yielding at `await`. Just noting this is an intentional pattern.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +4
-- Note: message_request is a high-write table. Standard CREATE INDEX may block writes during index creation.
-- Drizzle migrator does not support CREATE INDEX CONCURRENTLY. If write blocking is a concern,
-- manually pre-create indexes with CONCURRENTLY before running this migration (IF NOT EXISTS prevents conflicts).
CREATE INDEX IF NOT EXISTS "idx_keys_key" ON "keys" USING btree ("key");--> statement-breakpoint
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking index creation on high-write table

The migration comment correctly warns that CREATE INDEX on message_request may block writes. Since Drizzle migrator doesn't support CREATE INDEX CONCURRENTLY, this migration will acquire an ACCESS EXCLUSIVE lock on message_request for the duration of index creation. For a table described as "high-write", this could cause significant downtime depending on table size.

The IF NOT EXISTS guard allows operators to pre-create indexes concurrently before deploying, which is the recommended approach. Consider documenting this in a deployment/upgrade guide or release notes to ensure operators are aware they should run the CONCURRENTLY versions manually before deploying this migration.

Prompt To Fix With AI
This is a comment left during a code review.
Path: drizzle/0068_flaky_swarm.sql
Line: 1:4

Comment:
**Blocking index creation on high-write table**

The migration comment correctly warns that `CREATE INDEX` on `message_request` may block writes. Since Drizzle migrator doesn't support `CREATE INDEX CONCURRENTLY`, this migration will acquire an `ACCESS EXCLUSIVE` lock on `message_request` for the duration of index creation. For a table described as "high-write", this could cause significant downtime depending on table size.

The `IF NOT EXISTS` guard allows operators to pre-create indexes concurrently before deploying, which is the recommended approach. Consider documenting this in a deployment/upgrade guide or release notes to ensure operators are aware they should run the CONCURRENTLY versions manually before deploying this migration.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider area:UI enhancement New feature or request size/XL Extra Large PR (> 1000 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants