refactor(provider): improve provider page performance#782
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough本次变更引入端点探针与按厂商/类型的批量统计/日志接口,支持按 ID 批量读取并同步电路健康状态(含 Redis 批量加载与状态转移);增加视口首次进入懒加载钩子;服务器端新增 slim usage-log 与 quota 聚合;前端改为精确的 react-query 缓存失效与按需加载。 Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在全面优化 Providers 功能的性能和稳定性。通过解决 N+1 查询问题、减少重复请求、引入批量数据加载机制以及细化前端渲染策略,显著提升了用户界面的响应速度和后端服务的负载能力。同时,对数据库查询进行了深度优化,确保了系统在数据规模增长时的可扩展性和健壮性,从而为用户提供更流畅、更可靠的 Providers 管理体验。 Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request primarily focuses on optimizing data fetching and rendering performance, particularly for provider endpoint information and usage logs. Key changes include refactoring src/actions/my-usage.ts to use a new findUsageLogsForKeySlim function and a more efficient getUserConcurrentSessions method, and removing a deprecated sumUserCost function. A new src/repository/provider-endpoints-batch.ts file was introduced to provide batch fetching capabilities for provider endpoint probe logs and vendor-type endpoint statistics, which are then utilized by new Server Actions in src/actions/provider-endpoints.ts. The batchGetEndpointCircuitInfo action was updated to use a new getAllEndpointHealthStatusAsync function for batch retrieval of circuit breaker states from Redis, improving efficiency. Client-side components like EndpointLatencySparkline, ProviderEndpointHover, and ProviderEndpointsTable were updated to leverage these new batching actions and introduce a useInViewOnce hook for deferred loading of data when elements enter the viewport, preventing request storms. Additionally, router.refresh() calls were replaced with more granular queryClient.invalidateQueries() calls across several UI components for better cache management. The getProviderStatistics function in src/repository/provider.ts was optimized with a new SQL query structure and an in-memory cache, and src/repository/statistics.ts introduced a cache for keyId to keyString lookups to reduce database queries. Review comments highlighted critical security vulnerabilities due to the insecure exposure of repository functions as public Server Actions in src/repository/provider-endpoints-batch.ts, src/actions/my-usage.ts (findUsageLogsForKeySlim), and src/repository/provider.ts (getProviderStatistics), which lack proper authentication and authorization. Other feedback included concerns about hardcoded API paths, inefficient useEffect dependencies, and potential cache-busting issues with the keyStringByIdCache implementation.
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/repository/provider.ts`:
- Around line 900-901: The comment containing an emoji should be cleaned: remove
the "⭐" character from the comment that explains using the last item of
providerChain (provider_chain) to determine the final provider and falling back
to provider_id when provider_chain is empty; keep the rest of the Chinese text
intact and ensure references to providerChain, provider_chain, and provider_id
remain unchanged.
- Around line 976-983: The code currently casts the db.execute() result directly
to ProviderStatisticsRow[] (using "const data = result as unknown as
ProviderStatisticsRow[]"), but db.execute() from postgres-js/drizzle-orm returns
an iterable and other call sites use Array.from(result); replace the direct type
assertion by converting the result with Array.from(result) and then type it as
ProviderStatisticsRow[] before storing it in providerStatisticsCache and
returning it so the handling matches the other db.execute() usages.
🧹 Nitpick comments (18)
src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)
468-474: 编辑成功后失效 health 和 statistics 缓存,逻辑正确。编辑可能变更 key、type 等核心字段,同时刷新 health 和 statistics 是合理的。
一个小的一致性提示:
onSuccess(创建流程,Line 121-125)目前只失效了"providers"和"provider-vendors",未包含"providers-health"/"providers-statistics"。虽然新建的 provider 尚无健康检测/统计数据,影响有限,但如果后续需要在创建后立即显示健康状态,可能需要补齐。src/lib/hooks/use-in-view-once.ts (1)
36-36:options作为 effect 依赖会导致观察者在可见前反复重建。如果调用方每次渲染传入新对象字面量(如
useInViewOnce({ rootMargin: "100px" })),由于引用不等,effect 会反复执行,在元素可见前不断创建/销毁 IntersectionObserver。建议对
options做序列化或让调用方自行useMemo,或者在 hook 内部做浅比较缓存:可选的稳定化方案
+import { useEffect, useRef, useState, useMemo } from "react"; + export function useInViewOnce<T extends Element>(options?: IntersectionObserverInit) { const ref = useRef<T | null>(null); const [isInView, setIsInView] = useState(false); + const stableOptions = useRef(options); + // shallow compare to avoid re-triggering + if ( + options?.root !== stableOptions.current?.root || + options?.rootMargin !== stableOptions.current?.rootMargin || + options?.threshold !== stableOptions.current?.threshold + ) { + stableOptions.current = options; + } useEffect(() => { // ... - }, [isInView, options]); + }, [isInView, stableOptions.current]);src/repository/usage-logs.ts (1)
284-301:UsageLogSlimFilters和UsageLogSlimRow未导出,可能影响外部类型引用。目前这两个接口只用
interface声明,没有export。如果有外部调用方需要引用过滤器或返回行的类型(例如在 mock 或测试中显式标注类型),将无法直接导入。当前
my-usage.ts传入内联对象、测试通过 mock 绕过,所以暂时无问题。后续如果需要在其他地方引用这些类型,建议加上export。src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (2)
39-41: 模块级 deferred Map 在组件卸载后可能持有过期引用。如果一个组件在
flushVendorTypeEndpointStats执行前卸载,其 deferred promise 的resolve/reject仍留在deferredByProviderTypeVendorId中。虽然 resolve 一个已卸载组件的 promise 不会导致崩溃(React Query 会忽略已取消的查询),但 deferred 条目不会被清理,可能在极端快速挂载/卸载场景下累积。当前的
setTimeout(fn, 0)窗口极短,实际风险很低。仅作为防御性提醒。
260-267: 批量获取熔断状态时res.data可能为undefined。当
res.ok为true时,res.data在类型层面可能仍是T | undefined(取决于ActionResult的类型定义)。如果实际不可能出现undefined,建议加?? []做防御:- const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk }); - return res.ok ? res.data : []; + const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk }); + return res.ok ? (res.data ?? []) : [];src/actions/provider-endpoints.ts (1)
795-810:forceRefresh: true每次触发 Redis 读取,注意高频场景。
batchGetEndpointCircuitInfo使用forceRefresh: true绕过内存缓存直接读 Redis,适合管理页需要实时熔断状态的场景。但如果多个 tooltip 同时打开或短时间内反复打开/关闭,可能产生较多 Redis 请求。当前
staleTime: 1000 * 10(10 秒)在 React Query 层提供了一定缓冲。如果后续发现 Redis 压力,可考虑将forceRefresh改为仅在首次加载或用户手动刷新时启用。src/repository/provider-endpoints-batch.ts (2)
83-111: 原始 SQL 中硬编码了表名provider_endpoint_probe_logs。第 106 行直接使用字符串表名而非 Drizzle schema 引用。如果表名在 schema 定义中变更,此处不会同步更新。建议通过 Drizzle 的 schema 对象获取表名,或至少添加注释说明此处与 schema 的对应关系。
根据编码规范,repository 层应使用 Drizzle ORM 进行数据访问。此处使用原始 SQL 实现
ROW_NUMBER()窗口函数是合理的,但可以考虑引用 schema 中的表名以提高可维护性。建议引用 schema 表名
在文件顶部导入 probe logs 表的 schema 定义(如果存在),然后通过 Drizzle 的
getTableName()或类似机制引用表名:+import { providerEndpointProbeLogs } from "@/drizzle/schema"; +import { getTableName } from "drizzle-orm";然后在 SQL 中使用:
- FROM provider_endpoint_probe_logs + FROM ${sql.identifier(getTableName(providerEndpointProbeLogs))}如果 Drizzle 版本不支持
getTableName,可退而求其次添加注释:// Table name must match: providerEndpointProbeLogs in `@/drizzle/schema` FROM provider_endpoint_probe_logsAs per coding guidelines: "Use Drizzle ORM for data access in the repository layer".
9-13:toDate静默回退到new Date()可能掩盖数据异常。当
value既不是Date、string也不是number时,返回当前时间。这意味着如果数据库返回了意外类型(例如null),日志的createdAt会被静默替换为当前时间,而不是抛出或记录警告。对于探测日志的
createdAt字段,时间准确性较为重要。建议至少记录 debug 日志,或在调用侧对 null 值进行预处理。src/lib/endpoint-circuit-breaker.ts (1)
149-161: Redis 状态同步仅在circuitState变化时更新内存。Line 151 的条件
redisState.circuitState !== existingHealth.circuitState意味着如果circuitState相同但failureCount不同(例如另一个实例累积了更多失败),内存中的failureCount不会从 Redis 同步。对于管理页面展示场景,这可能导致不同实例上显示的failureCount不一致。当前实现与
getOrCreateHealth(line 62)保持了一致,且forceRefresh会先清除loadedFromRedis条目使得下次needsRefresh判断为 true,但由于 line 151 的条件限制,即使 forceRefresh 也不会更新 same-state 的 failureCount。如果多实例场景下展示准确的
failureCount是预期需求,建议放宽为:- if (!existingHealth || redisState.circuitState !== existingHealth.circuitState) { + if (!existingHealth + || redisState.circuitState !== existingHealth.circuitState + || redisState.failureCount !== existingHealth.failureCount) {如果当前行为是有意的(减少不必要的对象分配),可以忽略此建议。
tests/unit/actions/provider-endpoints.test.ts (1)
13-14: 批量仓库 mock 已声明但未在任何测试中断言使用。
findProviderEndpointProbeLogsBatchMock和findVendorTypeEndpointStatsBatchMock虽然作为模块 mock 注册是合理的(避免导入解析失败),但当前无任何测试用例验证其调用行为。如果这些批量仓库函数已在 action 层被使用,建议补充相应的测试覆盖。#!/bin/bash # 验证 findProviderEndpointProbeLogsBatch 和 findVendorTypeEndpointStatsBatch 在 action 层是否有实际调用 rg -n --type=ts 'findProviderEndpointProbeLogsBatch|findVendorTypeEndpointStatsBatch' -g '!**/test*' -g '!**/*.test.*'Also applies to: 57-60
src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx (1)
141-158: 批量分片获取熔断状态逻辑正确,建议提取分片工具函数。500 条为单位的分片 +
Promise.all并行拉取模式在本文件和endpoint-latency-sparkline.tsx中均有出现(相同的MAX_ENDPOINT_IDS_PER_BATCH和 chunking 循环)。可以考虑提取为通用工具函数以减少重复。♻️ 可选:提取通用分片工具
// e.g. in `@/lib/utils/chunk.ts` export function chunkArray<T>(arr: T[], size: number): T[][] { const chunks: T[][] = []; for (let i = 0; i < arr.length; i += size) { chunks.push(arr.slice(i, i + size)); } return chunks; }然后在两处分别引用:
- const MAX_ENDPOINT_IDS_PER_BATCH = 500; - const chunks: number[][] = []; - for (let index = 0; index < endpointIds.length; index += MAX_ENDPOINT_IDS_PER_BATCH) { - chunks.push(endpointIds.slice(index, index + MAX_ENDPOINT_IDS_PER_BATCH)); - } + const chunks = chunkArray(endpointIds, 500);src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (4)
48-91:normalizeProbeLogsByEndpointId中对logs数组元素未做运行时校验。Line 58、72、85 处均将
logs直接断言为ProbeLog[],但未验证数组内部元素是否确实包含ok和latencyMs字段。如果后端返回了非预期结构,下游.map()会产生undefined值而非报错,导致 sparkline 渲染异常但不会抛出明显错误(静默失败)。考虑到这是性能优化代码且后端格式相对可控,这属于防御性编程建议,可以后续再做。
93-142:isBatchProbeLogsEndpointAvailable状态一旦为false则整个页面生命周期内永不恢复。当遇到 404 后,该模块级变量被置为
false,后续所有批量请求均跳过,直到用户刷新页面。这在前后端版本切换场景下是合理的降级策略(PR 描述中也有说明),但如果用户长时间不刷新页面而后端已升级,将持续使用低效的逐条请求模式。如果需要改进,可以考虑加一个时间窗口(如 5 分钟后重试一次 batch):
♻️ 可选:添加重试时间窗口
-let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let batchProbeLogsDisabledAt: number | undefined; +const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000; async function tryFetchBatchProbeLogsByEndpointIds( endpointIds: number[], limit: number ): Promise<Record<number, ProbeLog[]> | null> { if (endpointIds.length <= 1) return null; - if (isBatchProbeLogsEndpointAvailable === false) return null; + if (isBatchProbeLogsEndpointAvailable === false) { + if (batchProbeLogsDisabledAt && Date.now() - batchProbeLogsDisabledAt < BATCH_RETRY_INTERVAL_MS) { + return null; + } + isBatchProbeLogsEndpointAvailable = undefined; + } if (process.env.NODE_ENV === "test") return null;在 404 分支也记录时间:
if (res.status === 404) { isBatchProbeLogsEndpointAvailable = false; + batchProbeLogsDisabledAt = Date.now(); return null; }
112-135: 批量分片采用串行处理,与provider-endpoints-table.tsx中的并行策略不一致。此处
for (const chunk of chunks)是串行的,而provider-endpoints-table.tsx中熔断状态的分片使用Promise.all(chunks.map(...))并行。对于 probe logs 数据量较大的场景,串行可能是有意为之(减少服务端压力),但值得在注释中说明理由,避免后续维护者困惑。此外,Line 130 处如果某个中间 chunk 的
normalized返回null,前面已成功处理的 chunk 结果会被丢弃,整体降级为逐条请求(这些端点会被重复拉取)。影响不大但值得注意。
183-227:ProbeLogsBatcher微批量合并设计合理,有一个小的防御性建议。10ms 的
setTimeout窗口 + 按 limit 分组 + snapshot-then-clear 的刷新策略都很好。不过 Line 199 的void this.flush()如果flush()中Promise.all外层发生非预期异常(虽然极不可能),会导致 unhandled promise rejection。可考虑加一个.catch:♻️ 可选:添加顶层 catch 防护
- this.flushTimer = setTimeout(() => { - this.flushTimer = null; - void this.flush(); - }, delayMs); + this.flushTimer = setTimeout(() => { + this.flushTimer = null; + this.flush().catch(() => {}); + }, delayMs);src/repository/provider.ts (3)
870-876: 考虑导出ProviderStatisticsRow类型
getProviderStatistics是导出函数,但其返回类型ProviderStatisticsRow未导出。调用方无法按名称引用该类型,只能通过ReturnType<>等方式推断,不利于类型复用。建议修改
-type ProviderStatisticsRow = { +export type ProviderStatisticsRow = { id: number; today_cost: string; today_calls: number; last_call_time: Date | null; last_call_model: string | null; };
886-898: 缓存对并发请求无去重(thundering herd)当缓存过期瞬间,若多个并发请求同时到达
getProviderStatistics,它们都会通过缓存检查(expiresAt > now为 false),各自发起独立的 DB 查询。虽然 10 秒 TTL 下概率较低,但在高频轮询场景下仍可能导致短暂的查询峰值。如果未来观察到此类峰值,可考虑使用
Promise级别的去重(即将 in-flight 查询的 Promise 缓存,后续请求复用同一 Promise),而非仅缓存结果数据。当前实现在现有规模下应该是可接受的。
903-908:boundsCTE 中AT TIME ZONE转换逻辑正确,但重复三次可简化
DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}模式重复了三次(today_start、tomorrow_start、last7_start)。可以先定义today_start,然后用算术表达式派生其余两个,减少重复:简化建议
WITH bounds AS ( SELECT (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start, - (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start, - (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) - INTERVAL '7 days' AS last7_start + (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start + ), + bounds_ext AS ( + SELECT *, today_start - INTERVAL '7 days' AS last7_start FROM bounds ),或者保持单 CTE,利用 PostgreSQL 的列引用能力(需注意 CTE 中不能直接引用同级列)。当前写法在功能上没有问题,仅是可读性建议。
There was a problem hiding this comment.
Code Review Summary
This is a well-structured performance optimization PR that addresses N+1 query patterns and request storms in the Providers management UI. The changes introduce batch APIs, micro-batching for probe logs, and lazy loading via intersection observers.
PR Size: L
- Lines changed: 1,840 (1,556 additions, 284 deletions)
- Files changed: 23
Note: As a Large PR, consider whether this could be split into smaller, more focused changes for easier review:
- Backend batch APIs (repository + actions)
- Frontend batching/deferred loading hooks
- UI component updates
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean (graceful fallbacks are intentional)
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Adequate (tests updated for new batch functions)
- Code clarity - Good
Notable Design Decisions
-
Micro-batching pattern (
ProbeLogsBatcher,requestVendorTypeEndpointStatsBatched): Uses module-level state with setTimeout debouncing to coalesce rapid requests. This is a valid pattern for reducing request storms. -
Concurrent worker pattern in
fetchProbeLogsByEndpointIds: Theidxincrement is synchronous before any await, making it safe in JavaScript's single-threaded execution model. -
Graceful degradation: Empty catch blocks in fallback paths intentionally suppress errors to allow fallback to individual API calls when batch endpoints are unavailable (e.g., during rolling deployments).
-
In-memory caches: TTL-based caches in
statistics.tsandusage-logs.tsreduce DB load for frequently accessed data. Cache invalidation is handled via TTL expiration.
Automated review by Claude AI
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@src/app/`[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:
- Around line 48-91: normalizeProbeLogsByEndpointId 未验证数组内元素结构就把数组断言为
ProbeLog[](在 Array.isArray(data) 分支、logsByEndpointId 分支和 items
分支),会把畸形数据传到下游。请在这三个分支里对 logs 数组做最小结构校验:确保数组且至少检查第一个或每个元素包含必须字段(如 ok,
latencyMs/timestamp 等),用 filter 过滤掉不符合结构的条目并只在通过校验后才赋值到
map(或将无效/缺失字段的条目替换为带默认值的安全对象),这样 downstream 的 SparkPoint 映射就不会收到 undefined 字段。
In `@src/lib/hooks/use-in-view-once.ts`:
- Around line 77-97: The effect in use-in-view-once.ts can return early when
ref.current is null (delayed mount) and never re-run because dependencies omit
the element; update the logic so the effect depends on the actual element (or
use a callback ref) to ensure the observer is created when the element mounts:
either include ref.current (or a local state like observedEl set by a ref
callback) in the effect dependency array and use that element for
observer.observe, or refactor the hook to expose/accept a callback ref that
assigns the element to state and triggers the IntersectionObserver creation;
make sure to still guard for test/IntersectionObserver absence, call
setIsInView(true) when appropriate, and disconnect the observer in the cleanup.
🧹 Nitpick comments (7)
src/repository/provider.ts (2)
988-992: 缓存expiresAt使用了查询前捕获的now
now在第 897 行捕获(查询执行之前),但在第 990 行用于计算缓存过期时间。如果查询耗时较长(例如慢查询场景),缓存的有效 TTL 会被缩短。建议在写入缓存时使用新的时间戳:建议修改
providerStatisticsCache = { timezone, - expiresAt: now + PROVIDER_STATISTICS_CACHE_TTL_MS, + expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS, data, };
920-962:final_provider_id的 CASE 表达式在两个 CTE 中重复
provider_stats和latest_call中的CASE WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id ELSE (provider_chain->-1->>'id')::int END完全相同。可考虑提取为一个公共 CTE(如resolved_requests),在其中计算final_provider_id后供后续 CTE 引用,减少维护时两处逻辑不一致的风险。src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)
121-127: 查询失效模式合理,可考虑抽取复用。新增的
providers-health和providers-statistics失效逻辑在 create / delete / edit / toggle 四处重复出现(toggle 合理地省略了 statistics)。当前规模可接受,但若后续再增加 query key,维护成本会上升。可考虑抽取一个invalidateProviderQueries(scope)辅助函数统一管理。示例:抽取失效辅助函数
// 在组件文件顶部或独立 util 中 function invalidateProviderQueries( queryClient: ReturnType<typeof useQueryClient>, scope: "full" | "toggle" = "full", ) { queryClient.invalidateQueries({ queryKey: ["providers"] }); queryClient.invalidateQueries({ queryKey: ["providers-health"] }); queryClient.invalidateQueries({ queryKey: ["provider-vendors"] }); if (scope === "full") { queryClient.invalidateQueries({ queryKey: ["providers-statistics"] }); } }src/lib/hooks/use-in-view-once.ts (1)
53-64: 在渲染期间直接读写 ref —— React 19 严格模式下存在隐患。
useStableIntersectionObserverOptions在渲染阶段读写stableOptionsRef.current(第 59-60 行)。React 19 严格模式下要求渲染函数为纯函数,渲染期间的 ref 读写虽然在实践中通常可行,但已被 React 官方文档标记为不推荐。如果 Strict Mode 的 double-render 导致比较在两次渲染间产生不同结果,可能出现细微问题。可选方案:通过
useRef+useEffect组合延迟更新,或使用useMemo配合序列化 key 来保持稳定引用。当前实际使用场景(options 几乎不变)下风险极低,标记为可选改进。src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (3)
93-93: 模块级可变状态isBatchProbeLogsEndpointAvailable不可恢复。404 触发后
isBatchProbeLogsEndpointAvailable被永久置为false(直到页面刷新)。如果后端临时返回 404(部署间隙、蓝绿切换等),用户需要刷新页面才能恢复批量路径。可考虑加一个带 TTL 的重置机制,比如 5 分钟后重新尝试批量端点。添加 TTL 重置的示例
-let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let batchDisabledAt: number | undefined; +const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000; + +function isBatchDisabled(): boolean { + if (isBatchProbeLogsEndpointAvailable === false) { + if (batchDisabledAt && Date.now() - batchDisabledAt > BATCH_RETRY_INTERVAL_MS) { + isBatchProbeLogsEndpointAvailable = undefined; + batchDisabledAt = undefined; + return false; + } + return true; + } + return false; +}然后在 404 处理中记录时间:
if (res.status === 404) { isBatchProbeLogsEndpointAvailable = false; + batchDisabledAt = Date.now();
95-206: 约 250 行的批量数据获取逻辑应考虑抽取到独立模块。
tryFetchBatchProbeLogsByEndpointIds、fetchProbeLogsByEndpointIdsIndividually、fetchProbeLogsByEndpointIds、normalizeProbeLogsByEndpointId以及ProbeLogsBatcher类构成了完整的数据获取层,与 UI 组件(sparkline 渲染)职责不同。将其抽取到如@/lib/probe-logs-batcher.ts有助于:
- 独立测试批量/降级/合并逻辑
- 在其他组件中复用 batcher
- 降低当前文件的认知负荷
Also applies to: 208-236, 238-245, 247-307
216-232: 并发 worker 模式正确但依赖 JS 单线程语义——值得加注释说明。
idx变量在多个 async worker 间共享,读取和自增发生在同一同步代码段(await之前),因此在 JS 事件循环模型下是安全的。但这一模式不够直观,建议加一行注释说明其安全性依赖。
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Show resolved
Hide resolved
|
跟进 CodeRabbit review 做了一轮小幅补齐(不改变现有 UX),主要是把一些 corner case 和一致性问题收敛掉:
本地已验证: |
|
跟进 CodeRabbit 最新两条 actionable,并顺手修正一个小的缓存 TTL 细节(不改变现有 UX/接口语义):
本地与 CI 均已通过(build/lint/typecheck/test 全绿),CodeRabbit 当前已回到 Approved。 |
|
补齐一轮 #779 相关的“热路径性能 + 错误性质 bug 收敛 + 安全收敛”(尽量不改变现有 UX/接口语义):
本地已验证: |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (1)
370-374:⚠️ Potential issue | 🟠 Major创建成功仅失效
provider-vendors缓存符合 PR 目标,但编辑路径存在缺陷:编辑操作无形式级别的缓存失效,完全依赖可选的onSuccess回调,若回调执行失败(第 378-380 行仅记录错误)将导致缓存得不到刷新。建议改进:要么为编辑路径添加与创建路径一致的形式级别失效(如
queryClient.invalidateQueries({ queryKey: ["providers"] })),要么将onSuccess改为必需参数,确保缓存刷新的可靠性。tests/unit/actions/total-usage-semantics.test.ts (1)
48-52:⚠️ Potential issue | 🟡 Minor缺少
SessionTracker.getUserSessionCount的 mock
getMyQuota现在调用getUserConcurrentSessions(userId)→SessionTracker.getUserSessionCount(userId),但此处 SessionTracker mock 仅提供getKeySessionCount,没有getUserSessionCount。虽然getUserConcurrentSessions内部有 try-catch 兜底返回 0,测试不会报错,但这属于"静默吞错"而非正确的 mock 行为。🛡️ 建议补充 mock
vi.mock("@/lib/session-tracker", () => ({ SessionTracker: { getKeySessionCount: (...args: unknown[]) => getKeySessionCountMock(...args), + getUserSessionCount: vi.fn().mockResolvedValue(0), }, }));
🤖 Fix all issues with AI agents
In `@src/actions/provider-endpoints.ts`:
- Around line 567-583: The code calls resetEndpointCircuitState again after a
successful probe, creating redundant Redis round-trips because
probeProviderEndpointAndRecordByEndpoint already calls resetEndpointCircuit when
appropriate; remove the entire conditional block that checks result.ok and tries
to call resetEndpointCircuitState (the try/catch logging calling
resetEndpointCircuitState with endpoint.id) from
src/actions/provider-endpoints.ts so the single reset path inside
probeProviderEndpointAndRecordByEndpoint (which invokes resetEndpointCircuit) is
the only place handling circuit resets.
In `@src/repository/statistics.ts`:
- Around line 984-1019: The outer WHERE uses scanEnd as an upper bound which can
truncate the costTotal aggregate (filtered only by createdAt >= cutoffDate); fix
by computing scanEnd to include the current time so it cannot be earlier than
now — i.e. set scanEnd = new Date(Math.max(..., Date.now())) when building
scanEnd (the same pattern should be applied in the analogous
sumKeyQuotaCostsById logic); update uses of scanEnd (and any other range
end-time max calculations) so costTotal (and related totals) are not
artificially upper-bounded by stale range endTimes.
🧹 Nitpick comments (8)
src/repository/usage-logs.ts (3)
380-417: COUNT 和数据查询可以并行执行
findUsageLogsForKeySlim中 count 查询(line 380)和分页数据查询(line 388)是顺序执行的,但两者相互独立,可以用Promise.all并行化以减少延迟。♻️ 建议使用 Promise.all 并行查询
- const [countResult] = await db - .select({ totalRows: sql<number>`count(*)::double precision` }) - .from(messageRequest) - .where(and(...conditions)); - - const total = countResult?.totalRows ?? 0; - - const offset = (safePage - 1) * safePageSize; - const results = await db - .select({ - id: messageRequest.id, - ... - }) - .from(messageRequest) - .where(and(...conditions)) - .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id)) - .limit(safePageSize) - .offset(offset); + const offset = (safePage - 1) * safePageSize; + const [countResult, results] = await Promise.all([ + db + .select({ totalRows: sql<number>`count(*)::double precision` }) + .from(messageRequest) + .where(and(...conditions)), + db + .select({ + id: messageRequest.id, + // ... 其余字段 + }) + .from(messageRequest) + .where(and(...conditions)) + .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id)) + .limit(safePageSize) + .offset(offset), + ]); + + const total = countResult?.[0]?.totalRows ?? 0;
420-444: 缓存写入时expiresAt使用调用方传入的now而非当前时间根据 PR 评论中的讨论,写入缓存时
expiresAt应使用Date.now()而非调用方在查询前捕获的now,以避免慢查询场景下缓存有效期被缩短。当前实现中setDistinctKeyOptionsCache直接使用调用方传入的now(在 DB 查询之前捕获),可能导致有效 TTL 减少。♻️ 建议在 setDistinctKeyOptionsCache 内部使用 Date.now()
function setDistinctKeyOptionsCache( cache: Map<string, { data: string[]; expiresAt: number }>, key: string, data: string[], - now: number ): void { + const now = Date.now(); if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) { for (const [k, v] of cache) { if (v.expiresAt <= now) { cache.delete(k); } } if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) { cache.clear(); } } cache.set(key, { data, expiresAt: now + DISTINCT_KEY_OPTIONS_CACHE_TTL_MS }); }
284-301:UsageLogSlimFilters和UsageLogSlimRow未导出
findUsageLogsForKeySlim是导出函数,但其参数类型UsageLogSlimFilters和返回值中的UsageLogSlimRow没有导出。虽然 TypeScript 结构化类型允许调用方传入字面量对象,但在外部代码需要引用这些类型时(如在测试 mock 中构造类型安全的参数)会受限。AI 摘要也提到这些是新增的公共类型。♻️ 建议导出这两个接口
-interface UsageLogSlimFilters { +export interface UsageLogSlimFilters { keyString: string; ... } -interface UsageLogSlimRow { +export interface UsageLogSlimRow { id: number; ... }Also applies to: 303-319
src/actions/my-usage.ts (1)
600-648:keyBreakdown与userBreakdown查询字段不一致
keyBreakdown查询包含cacheCreation5mTokens和cacheCreation1hTokens(line 611-612),但userBreakdown查询没有这两个字段(line 627-636)。虽然这两个字段仅用于 summary 级别的统计(从 keyBreakdown reduce 而来)、不影响userModelBreakdown的映射逻辑,但两个并行查询的 select 列表不一致容易引起混淆,且如果后续需要在 userModelBreakdown 中也展示这些字段则需要再改。src/lib/provider-endpoints/endpoint-selector.ts (2)
41-45:findEnabledProviderEndpointsByVendorAndType已在 DB 层过滤isEnabled和deletedAt,应用层过滤冗余
findEnabledProviderEndpointsByVendorAndType的查询条件已包含is_enabled = true AND deleted_at IS NULL(见src/repository/provider-endpoints.tsline 823-828),line 45 的e.isEnabled && !e.deletedAt检查重复。仅excludeSet.has(e.id)过滤是必要的。♻️ 简化过滤逻辑
- const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id)); + const filtered = excludeSet.size > 0 + ? endpoints.filter((e) => !excludeSet.has(e.id)) + : endpoints;
81-98:getEndpointFilterStats中 enabled 端点的过滤执行了两次Line 83 和 line 90 对同一数组执行了相同的
e.isEnabled && !e.deletedAt过滤。可以复用一次过滤的结果。♻️ 消除重复过滤
const endpoints = await findProviderEndpointsByVendorAndType(input.vendorId, input.providerType); const total = endpoints.length; - const enabled = endpoints.filter((e) => e.isEnabled && !e.deletedAt).length; + const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt); + const enabled = enabledEndpoints.length; // When endpoint circuit breaker is disabled, no endpoints can be circuit-open if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) { return { total, enabled, circuitOpen: 0, available: enabled }; } - const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt); if (enabledEndpoints.length === 0) { return { total, enabled: 0, circuitOpen: 0, available: 0 }; }src/repository/provider-endpoints.ts (1)
793-833: 建议复用providerEndpointSelectFields常量减少重复。
findEnabledProviderEndpointsByVendorAndType的.select(...)字段与findProviderEndpointsByVendorAndType(第 762-779 行)以及第 209 行定义的providerEndpointSelectFields完全相同。直接复用该常量可减少维护负担。♻️ 建议的改动
const rows = await db - .select({ - id: providerEndpoints.id, - vendorId: providerEndpoints.vendorId, - providerType: providerEndpoints.providerType, - url: providerEndpoints.url, - label: providerEndpoints.label, - sortOrder: providerEndpoints.sortOrder, - isEnabled: providerEndpoints.isEnabled, - lastProbedAt: providerEndpoints.lastProbedAt, - lastProbeOk: providerEndpoints.lastProbeOk, - lastProbeStatusCode: providerEndpoints.lastProbeStatusCode, - lastProbeLatencyMs: providerEndpoints.lastProbeLatencyMs, - lastProbeErrorType: providerEndpoints.lastProbeErrorType, - lastProbeErrorMessage: providerEndpoints.lastProbeErrorMessage, - createdAt: providerEndpoints.createdAt, - updatedAt: providerEndpoints.updatedAt, - deletedAt: providerEndpoints.deletedAt, - }) + .select(providerEndpointSelectFields) .from(providerEndpoints)tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (1)
108-131: 健康状态 mock 类型定义在多个测试块中重复。
getAllStatusMock的返回值类型(包含failureCount,circuitState等字段的 Record)在第 108-131、206-229、291-314 行重复了三次。可以考虑将类型提取为文件级别的 type alias 或抽取一个 helper 函数来构造 mock,减少重复。不过对测试代码来说可读性优先,这只是一个可选优化。
src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx
Show resolved
Hide resolved
|
跟进最新 AI review(CodeRabbit / Greptile / Gemini)再收敛一轮(尽量不改变现有 UX/接口语义):
本地已验证: @coderabbitai 上次 changes requested 的两点已覆盖修复,如方便请重新 review/更新状态。 |
|
✅ 移除冗余的熔断重置调用 而且您已经完成了完整的本地验证(build、lint、typecheck、test)。 我现在会重新审查本 PR 的所有变更。 ✅ Actions performedReview triggered.
|
|
本次在 PR #782 上继续补齐/修正了几处细节(均与 #779 性能/稳定性相关,尽量不改变现有 UX):
本地验证: |
|
本次更新(同步 dev + 跟进少量 review 关注点):
本地已验证: |
|
本次在 PR #782 上额外合入对 issue #781(Endpoint Health 旧端点/旧类型残留、持续 probe)的修复,尽量保持现有 UX:
本地验证: |
- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈
- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)
2b31c19 to
9680cd9
Compare
|
补充提交(2026-02-15):
本地已复跑: |
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Show resolved
Hide resolved
src/app/[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx
Outdated
Show resolved
Hide resolved
src/app/[locale]/dashboard/availability/_components/endpoint/endpoint-tab.tsx
Show resolved
Hide resolved
|
补充 follow-up(基于 AI review):
|
|
本次 push(
本地验证: |
54afd85
into
ding113:refactor/provider-performance
* fix: Providers 管理页批量化端点统计与测活日志 * perf: 优化 provider 统计与 my-usage 查询性能 * perf: Providers 管理页移除 refresh 放大器并按需加载端点区块 * fix: 跟进 review 补齐 Providers 批量与统计健壮性 * fix: 跟进 CodeRabbit 修复 in-view 与测活数据校验 * perf: 补齐 in-view 稳定化与 batch 404 复原 * perf: my-usage 配额/汇总减少 DB 往返 * perf(providers): 端点池热路径批量熔断查询与索引迁移 (#779) - 运行时端点选择与严格审计统计改为批量读取端点熔断状态,减少 Redis 往返\n- probe 写入在端点并发删除时静默忽略,避免 FK 失败导致任务中断\n- 新增索引迁移:idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active\n- repository 批量查询模块改为 server-only,避免误暴露为 Server Action * fix: 跟进 review 去重熔断 reset 与 scanEnd (#779) * fix: 精确熔断 reset + repo 使用 server-only (#779) * fix: my-usage 补齐 sessionId/warmup 过滤 (#779) * perf: provider 统计 in-flight 去重更稳健 (#779) * fix: ProviderForm 统一失效相关缓存 (#779) * fix: Providers/Usage 细节修正与用例补齐 (#779) * style: biome 格式化补齐 (#779) * fix(#779): 熔断状态同步与 probeLogs 批量查询改进 * fix(#781): 清理孤儿端点并修正 Endpoint Health * perf: 优化 usage logs 与端点同步(#779/#781) * refactor: 移除端点冗余过滤(#779) * fix: 熔断状态批量查询仅覆盖启用端点(#779) * fix: Provider 统计兼容脏数据并稳定 probe logs 排序(#779) * perf: 禁用 Providers 重查询的 window focus 自动刷新(#779) * fix: 多实例熔断状态定期同步,并修复 backfill 遗留软删除端点(#779/#781) * perf: probe scheduler 仅探测启用 provider 的端点(#781) * perf: ProviderForm 避免重复 refetch 并稳定 hover circuit key(#779) * perf: 全局 QueryClient 策略与 usage/user 索引优化(#779) * perf: 时区统计索引命中与批量删除优化(#779) * perf: 降低 logs/users 页面无效重算 * fix(provider): endpoint pool 仅基于启用 provider - sync/backfill/delete:引用判断与回填仅考虑 is_enabled=true 的 provider,避免 disabled provider 复活旧 endpoint - updateProvider:provider 从禁用启用时确保端点存在 - Dashboard Endpoint Health:避免并发刷新覆盖用户切换,vendor/type 仅从启用 provider 推导 - probe logs 批量接口:滚动发布场景下部分 404 不全局禁用 batch - 补齐 endpoint-selector 单测以匹配 findEnabled* 语义 * perf: Dashboard vendor/type 轻量查询与 usage logs 并行查询 * fix(migrate): advisory lock 串行迁移并移除 emoji 日志 * fix: endpoint hover 兜底并规范 batch probe logs SQL * perf(settings/providers): 减少冗余刷新并复用 endpoint/circuit 缓存 * perf(probe/statistics): 修正 probe 锁/计数并收敛统计与 usage 扫描 * perf(probe/ui): 优化 probe 目标筛选 SQL 并减少 sparkline 闪烁 * fix(db): 修复 Drizzle snapshot 链 * fix(perf): 补强 Providers 批量与缓存一致性 - Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示) * chore: format code (issue-779-provider-performance-23b338e) * chore: 触发 CI 重跑 * fix(provider): 批量启用时补齐 endpoint pool - batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断 * fix(perf): 收敛 Providers 刷新放大并优化探测/分页 * perf: 收敛 availability/probe 轮询并优化 my-usage (#779/#781) - AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈 * fix(ui): 恢复全局 react-query 默认配置 * fix(availability): 刷新 vendors 时清理旧 endpoint 选择 * perf: 补强 Providers 探测与 Usage Logs 性能 * perf(ui): useInViewOnce 共享 IntersectionObserver 降低资源占用 - 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义) * perf: providers batch where 优化与 sparkline 降级并发修正 * perf: my-usage breakdown 补齐缓存字段并优化筛选缓存 * perf: 优化端点熔断 Redis 负载与探测候选 * fix(#781): Endpoint Health 仅展示启用 provider 引用端点 * 修正端点健康筛选并增强URL解析容错 * docs(provider-endpoints): 说明 keepPreviousWhenReferenced 语义 * perf(availability): EndpointTab 前后台切换节流刷新 * docs(availability): 补充 EndpointTab 刷新节流注释 * chore(review): 按 AI 审阅补齐注释并收敛细节 * fix: 修正 provider 统计 SQL 的 DST 日界 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* refactor(provider): improve provider page performance (#782) * fix: Providers 管理页批量化端点统计与测活日志 * perf: 优化 provider 统计与 my-usage 查询性能 * perf: Providers 管理页移除 refresh 放大器并按需加载端点区块 * fix: 跟进 review 补齐 Providers 批量与统计健壮性 * fix: 跟进 CodeRabbit 修复 in-view 与测活数据校验 * perf: 补齐 in-view 稳定化与 batch 404 复原 * perf: my-usage 配额/汇总减少 DB 往返 * perf(providers): 端点池热路径批量熔断查询与索引迁移 (#779) - 运行时端点选择与严格审计统计改为批量读取端点熔断状态,减少 Redis 往返\n- probe 写入在端点并发删除时静默忽略,避免 FK 失败导致任务中断\n- 新增索引迁移:idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active\n- repository 批量查询模块改为 server-only,避免误暴露为 Server Action * fix: 跟进 review 去重熔断 reset 与 scanEnd (#779) * fix: 精确熔断 reset + repo 使用 server-only (#779) * fix: my-usage 补齐 sessionId/warmup 过滤 (#779) * perf: provider 统计 in-flight 去重更稳健 (#779) * fix: ProviderForm 统一失效相关缓存 (#779) * fix: Providers/Usage 细节修正与用例补齐 (#779) * style: biome 格式化补齐 (#779) * fix(#779): 熔断状态同步与 probeLogs 批量查询改进 * fix(#781): 清理孤儿端点并修正 Endpoint Health * perf: 优化 usage logs 与端点同步(#779/#781) * refactor: 移除端点冗余过滤(#779) * fix: 熔断状态批量查询仅覆盖启用端点(#779) * fix: Provider 统计兼容脏数据并稳定 probe logs 排序(#779) * perf: 禁用 Providers 重查询的 window focus 自动刷新(#779) * fix: 多实例熔断状态定期同步,并修复 backfill 遗留软删除端点(#779/#781) * perf: probe scheduler 仅探测启用 provider 的端点(#781) * perf: ProviderForm 避免重复 refetch 并稳定 hover circuit key(#779) * perf: 全局 QueryClient 策略与 usage/user 索引优化(#779) * perf: 时区统计索引命中与批量删除优化(#779) * perf: 降低 logs/users 页面无效重算 * fix(provider): endpoint pool 仅基于启用 provider - sync/backfill/delete:引用判断与回填仅考虑 is_enabled=true 的 provider,避免 disabled provider 复活旧 endpoint - updateProvider:provider 从禁用启用时确保端点存在 - Dashboard Endpoint Health:避免并发刷新覆盖用户切换,vendor/type 仅从启用 provider 推导 - probe logs 批量接口:滚动发布场景下部分 404 不全局禁用 batch - 补齐 endpoint-selector 单测以匹配 findEnabled* 语义 * perf: Dashboard vendor/type 轻量查询与 usage logs 并行查询 * fix(migrate): advisory lock 串行迁移并移除 emoji 日志 * fix: endpoint hover 兜底并规范 batch probe logs SQL * perf(settings/providers): 减少冗余刷新并复用 endpoint/circuit 缓存 * perf(probe/statistics): 修正 probe 锁/计数并收敛统计与 usage 扫描 * perf(probe/ui): 优化 probe 目标筛选 SQL 并减少 sparkline 闪烁 * fix(db): 修复 Drizzle snapshot 链 * fix(perf): 补强 Providers 批量与缓存一致性 - Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示) * chore: format code (issue-779-provider-performance-23b338e) * chore: 触发 CI 重跑 * fix(provider): 批量启用时补齐 endpoint pool - batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断 * fix(perf): 收敛 Providers 刷新放大并优化探测/分页 * perf: 收敛 availability/probe 轮询并优化 my-usage (#779/#781) - AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈 * fix(ui): 恢复全局 react-query 默认配置 * fix(availability): 刷新 vendors 时清理旧 endpoint 选择 * perf: 补强 Providers 探测与 Usage Logs 性能 * perf(ui): useInViewOnce 共享 IntersectionObserver 降低资源占用 - 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义) * perf: providers batch where 优化与 sparkline 降级并发修正 * perf: my-usage breakdown 补齐缓存字段并优化筛选缓存 * perf: 优化端点熔断 Redis 负载与探测候选 * fix(#781): Endpoint Health 仅展示启用 provider 引用端点 * 修正端点健康筛选并增强URL解析容错 * docs(provider-endpoints): 说明 keepPreviousWhenReferenced 语义 * perf(availability): EndpointTab 前后台切换节流刷新 * docs(availability): 补充 EndpointTab 刷新节流注释 * chore(review): 按 AI 审阅补齐注释并收敛细节 * fix: 修正 provider 统计 SQL 的 DST 日界 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * refactor: consolidate migrations, extract shared utilities, fix bugbot issues Merge 6 index migrations (0068-0073) into single idempotent migration. Extract reusable utilities from duplicated code across the codebase: - TTLMap<K,V>: generic LRU+TTL cache replacing 3 inline implementations - createAbortError: shared abort error factory from 2 components - startLeaderLockKeepAlive: shared leader lock renewal from 2 schedulers - ProbeLogsBatcher: data-fetching infra extracted from sparkline component - buildUsageLogConditions: shared SQL filter builder from 3 query functions Additional cleanup: - Simplify useInViewOnce hook (remove unused options, keep shared observer pool) - Remove dead code (sumKeyTotalCostById, unexport internal types) - Hardcode env var defaults (ENDPOINT_CIRCUIT_HEALTH_CACHE_MAX_SIZE, ENDPOINT_PROBE_IDLE_DB_POLL_INTERVAL_MS) - Fix in-flight dedup race condition in getProviderStatistics - Fix yesterday/today interval boundary inconsistency (lte -> lt) - Add NaN guard for limitPerEndpoint in batch probe logs - Add updatedAt to deleteProvider for audit consistency - Log swallowed flush() errors in batchers instead of silently catching * fix: resolve loading state reset and advisory lock client close errors Remove silent option guard so vendor loading state always resets when the request completes, preventing stale loading indicators. Wrap advisory lock client.end() in try-catch to avoid unhandled errors during connection teardown. --------- Co-authored-by: tesgth032 <tesgth032@hotmail.com> Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
关联:
ding113/claude-code-hub#779、ding113/claude-code-hub#781背景/问题
主要改动(尽量保持 UX 不变)
batchGetVendorTypeEndpointStats批量统计 enabled 数;tooltip 打开时才拉 endpoints 详情与熔断状态useInViewOnce共享 IntersectionObserver pool,减少大列表 observer 实例开销);10ms 微批量合并;优先走 batch HTTPbatchGetProviderEndpointProbeLogs(允许部分 chunk 成功 + 部分 404/失败并存,按需降级);支持 query cancel/卸载时从批处理中移除请求,减少无意义 flushgetAllEndpointHealthStatusAsync同步内存;运行时端点选择也改为批量读取熔断状态,减少 Redis 往返ENDPOINT_CIRCUIT_HEALTH_CACHE_MAX_SIZE),避免 endpointId 规模增长导致常驻内存膨胀HSET+EXPIRE减少 RTTbatchGetEndpointCircuitInfo不再强制forceRefresh,利用 1s sync TTL + in-flight 去重降低瞬时 Redis 读峰值(保持展示语义不变)resetEndpointCircuit;probe 记录在端点被并发删除时静默忽略(避免 FK 失败导致任务中断)syncProviderEndpointOnProviderEdit优先读取 active 行;revive 遇到 23505 时降级为重读 active 行,避免并发/历史脏数据导致事务回滚router.refresh();invalidation 从全局收敛到 vendor 维度;Vendor 视图端点池 section 增加 in-view 延迟挂载ProviderEndpointHover降级并发 worker 边界更明确;my-usage model breakdown 透出 cacheCreation5m/1h 字段;distinct models/endpoints 缓存 cache hit 做 LRU-like bump + sliding TTL,减少高频筛选的重复 DISTINCT 扫描providers.url一一对应)(Provider Availability Endpoint #781)src/repository/provider-endpoints-batch.ts移除顶层"use server",改为server-only(避免误暴露为 Server Action)兼容/升级
idx_provider_endpoints_pick_enabled:加速运行时端点选择热路径(vendor_id + provider_type + is_enabled定位 +sort_order有序扫描)idx_providers_vendor_type_url_active:加速 provider URL 编辑时的引用判断(vendor/type/url 精确匹配)idx_providers_enabled_vendor_type:加速 Dashboard/Probe scheduler 的 enabled vendor/type DISTINCT(partial:deleted_at IS NULL AND is_enabled = true AND provider_vendor_id > 0)idx_message_request_key_created_at_id:加速 my-usage/usage logs 的 key 维度分页与时间范围扫描(key + created_at desc + id desc,partialdeleted_at IS NULL)idx_message_request_key_model_active:加速 my-usage 下拉筛选器DISTINCT model(按 key 维度,partial 排除 warmup)idx_message_request_key_endpoint_active:加速 my-usage 下拉筛选器DISTINCT endpoint(按 key 维度,partial 排除 warmup)idx_message_request_created_at_id_active:加速 admin usage logs 的 keyset 分页(created_at desc + id desc,partialdeleted_at IS NULL)idx_message_request_model_active:加速 admin usage logs 筛选器的DISTINCT model(partialdeleted_at IS NULL AND model IS NOT NULL)idx_message_request_status_code_active:加速 admin usage logs 筛选器的DISTINCT status_code(partialdeleted_at IS NULL AND status_code IS NOT NULL)message_request表数据量较大/写入频繁,索引迁移(尤其 0069/0070/0072)使用标准CREATE INDEX,可能在建索引期间阻塞写入;如对停写敏感,建议维护窗口升级,或提前手动使用CREATE INDEX CONCURRENTLY预建同名索引(迁移已使用IF NOT EXISTS,预建不会冲突)AUTO_MIGRATE=true会自动应用迁移AUTO_MIGRATE:请在升级后执行bun run db:migrate(或使用 Drizzle CLI)应用迁移本地验证
bun run lint(仅既有 useExhaustiveDependencies warnings)bun run lint:fixbun run typecheckbun run testbun run buildbun run db:generate(生成索引迁移;迁移幂等性检查通过)Greptile Summary
This PR implements comprehensive performance and stability optimizations for the Provider management system, addressing N+1 query patterns, request storms, and cache synchronization issues across a large-scale distributed deployment.
Major Improvements:
provider-endpoints-batch.tsconsolidates vendor stats and probe logs fetching into single parameterized queries with LATERAL joins, reducing round-trips from O(n) to O(1) for listsuseInViewOnceviewport detection prevents request storms on scroll/mount; shared IntersectionObserver pool reduces observer overhead in large listsmessage_request,providers, andprovider_endpointsto support runtime endpoint selection, usage log pagination, and DISTINCT queries (Compatibility Notes:
message_requesttable without CONCURRENTLY - production deployments should either apply during maintenance window or pre-create indexes manuallyTest Coverage:
useInViewOnceshared observer poolingConfidence Score: 4/5
message_requesttables - consider CREATE INDEX CONCURRENTLY or maintenance windowImportant Files Changed
message_requestfor usage logs pagination and filtering - locking noted in commentsmessage_requestand GIN index for users tags - locking notedmessage_requestfor DISTINCT model/endpoint filtering in my-usage viewsFlowchart
flowchart TD A[Provider List/Detail View] -->|Mount| B{useInViewOnce} B -->|In View| C[10ms Micro-Batch Queue] C -->|Flush| D{Batch Size Check} D -->|>1 Endpoint| E[POST /api/batchGetProviderEndpointProbeLogs] D -->|Single| F[Fallback: Individual Server Action] E -->|404| G[Mark Batch Unavailable<br/>5min Retry Timer] E -->|Success| H[Batch Available] E -->|Partial Fail| I[Fallback: Individual Fetch] H --> J[Merge Results] I --> J F --> J G -->|After 5min| D K[Endpoint Selector Runtime] -->|Pick Endpoint| L{Redis Sync Needed?} L -->|Yes - In-flight?| M{Wait Existing} L -->|Yes - New| N[Batch Pipeline Load<br/>Redis HGETALL] M --> O[Sync to Memory] N --> O L -->|No - Use Cache| O O --> P{Circuit State} P -->|Open| Q[Skip Endpoint] P -->|Closed/Half-Open| R[Use Endpoint] S[Probe Scheduler Tick] -->|Query DB| T[INNER JOIN<br/>enabled_vendor_types CTE] T -->|Filter by Interval| U[Due Endpoints] U -->|Concurrent Probe| V[Record Success/Failure] V -->|Update| W[Circuit Breaker State] W -->|Default Closed| X[DELETE Redis Key] W -->|Non-Default| Y[HSET + EXPIRE Pipeline] Z[Usage Logs Query] -->|First Page| AA{Total Count Cached?} AA -->|No| AB[COUNT + Cache 30s] AA -->|Yes| AC[Use Cached Total] AB --> AD[Slim Query<br/>keyset pagination] AC --> AD AD -->|Indexed Scan| AE[Results]Last reviewed commit: 6e3c10c