feat(probe): implement dynamic endpoint probe intervals#669
Conversation
- Change probe scheduling from global interval to per-endpoint dynamic intervals: - Base interval: 60s (configurable via ENDPOINT_PROBE_INTERVAL_MS) - Single-endpoint vendor: 10min (reduces unnecessary probing) - Timeout override: 10s (faster recovery for timeout errors) - Filter probes to only "due" endpoints based on lastProbedAt + effectiveInterval - Remove type tabs from VendorEndpointsSection, show all endpoints in single list - Display type icon with tooltip for each endpoint row - Sort endpoints by type order (from getAllProviderTypes) then sortOrder - Add type selector in AddEndpointButton dialog - Update i18n strings for all 5 languages - Add comprehensive unit tests for dynamic interval rules Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthrough本次变更:将端点探测基础间隔从 10000ms 调整为 60000ms,增加并英文化探测动态间隔注释,新增按供应商检索端点的后端 API、重构供应商端点 UI 为供应商级获取并添加类型列与选择器,扩展探测调度器以支持基于供应商的动态间隔与新字段,及相关本地化与测试更新。 Changes
Estimated code review effort🎯 3 (中等) | ⏱️ ~25 分钟 Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces dynamic endpoint probe intervals to optimize probing frequency based on endpoint characteristics and error status. It also enhances the provider settings UI by displaying all endpoints in a single table with type indicators and improving the endpoint creation process. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a dynamic probing interval mechanism for provider endpoints, which is a great enhancement for efficiency. The logic is well-structured with different intervals for timeouts, single-vendor endpoints, and a base configurable interval. The accompanying UI changes to remove type tabs and show all endpoints in a single table with type icons improve usability. The test coverage for the new logic is comprehensive. I have two pieces of feedback: one for improving maintainability by avoiding a hardcoded list of provider types, and another is a query about the removal of the circuit breaker UI, which seems to be a feature regression and is not documented in the PR description.
| function VendorEndpointsSection({ vendorId }: { vendorId: number }) { | ||
| const t = useTranslations("settings.providers"); | ||
| const tTypes = useTranslations("settings.providers.types"); | ||
| const [activeType, setActiveType] = useState<ProviderType>("claude"); | ||
|
|
||
| const providerTypes: ProviderType[] = ["claude", "codex", "gemini", "openai-compatible"]; | ||
|
|
||
| return ( | ||
| <div> | ||
| <div className="px-6 py-3 bg-muted/10 border-b font-medium text-sm text-muted-foreground flex items-center justify-between"> | ||
| <span>{t("endpoints")}</span> | ||
| <AddEndpointButton vendorId={vendorId} /> | ||
| </div> | ||
|
|
||
| <div className="p-6"> | ||
| <div className="flex flex-col space-y-4"> | ||
| <div className="flex items-center justify-between"> | ||
| <div className="flex items-center space-x-2 bg-muted p-1 rounded-md"> | ||
| {providerTypes.map((type) => { | ||
| const typeConfig = getProviderTypeConfig(type); | ||
| const TypeIcon = typeConfig.icon; | ||
| const typeKey = getProviderTypeTranslationKey(type); | ||
| const label = tTypes(`${typeKey}.label`); | ||
| return ( | ||
| <Button | ||
| key={type} | ||
| variant={activeType === type ? "default" : "ghost"} | ||
| size="sm" | ||
| onClick={() => setActiveType(type)} | ||
| className="h-7 text-xs capitalize" | ||
| > | ||
| <span | ||
| className={`mr-1.5 inline-flex h-5 w-5 items-center justify-center rounded ${typeConfig.bgColor}`} | ||
| > | ||
| <TypeIcon className={`h-3.5 w-3.5 ${typeConfig.iconColor}`} /> | ||
| </span> | ||
| {label} | ||
| </Button> | ||
| ); | ||
| })} | ||
| </div> | ||
|
|
||
| <AddEndpointButton vendorId={vendorId} providerType={activeType} /> | ||
| </div> | ||
|
|
||
| <VendorTypeCircuitControl vendorId={vendorId} providerType={activeType} /> | ||
|
|
||
| <EndpointsTable vendorId={vendorId} providerType={activeType} /> | ||
| </div> | ||
| <EndpointsTable vendorId={vendorId} /> | ||
| </div> | ||
| </div> | ||
| ); | ||
| } |
There was a problem hiding this comment.
This refactoring removes the VendorTypeCircuitControl component, which provided UI for monitoring and managing vendor-type circuit breakers. This is a significant change that isn't mentioned in the pull request description and appears to be a feature regression. If this was intentional, it would be good to document the reasoning. If it was an oversight, the functionality should be restored, perhaps in a new form that fits the refactored UI.
| const [providerType, setProviderType] = useState<ProviderType>("claude"); | ||
|
|
||
| // Get provider types for the selector (exclude claude-auth and gemini-cli which are internal) | ||
| const selectableTypes: ProviderType[] = ["claude", "codex", "gemini", "openai-compatible"]; |
There was a problem hiding this comment.
Hardcoding the list of selectableTypes can lead to maintenance issues if new provider types are added in the future. It would be more robust to derive this list from getAllProviderTypes() and filter out the internal types. This ensures that the list stays in sync with the available provider types automatically.
const selectableTypes: ProviderType[] = getAllProviderTypes().filter(
(type) => !["claude-auth", "gemini-cli"].includes(type)
);
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR.
PR Size: M (Medium)
- Lines changed: 789 (583 additions, 206 deletions)
- Files changed: 13
Changes Overview
This PR implements dynamic per-endpoint probe intervals with a three-tier priority system:
- Timeout Override (10s): Fast recovery for endpoints with timeout errors
- Single-Vendor (10min): Reduced probing for vendors with only one endpoint
- Base Interval (60s): Default interval for all other endpoints
The UI changes remove type tabs from VendorEndpointsSection, showing all endpoints with type icons and tooltips instead.
Review Coverage
- Logic and correctness - Clean
- Dynamic interval calculation logic is sound with correct priority ordering
filterDueEndpointscorrectly handles nulllastProbedAt(always due)- Vendor endpoint counting is accurate
- Security (OWASP Top 10) - Clean
- No security concerns identified
- Error handling - Clean
- New
getProviderEndpointsByVendorfollows existing error handling patterns (returns empty array with logging)
- New
- Type safety - Clean
ProviderEndpointProbeTargettype properly extended withvendorIdandlastProbeErrorType- Type export added to repository index
- Documentation accuracy - Clean
.env.exampledocumentation accurately describes the dynamic interval rules
- Test coverage - Excellent
- 7 comprehensive test cases covering all interval rules:
- Default 60s interval for multi-endpoint vendors
- 10min interval for single-endpoint vendors
- 10s timeout override
- Priority: timeout > single-vendor > base
- Recovery behavior (lastProbeOk=true reverts to normal)
- Never-probed endpoints always due
- 7 comprehensive test cases covering all interval rules:
- Code clarity - Good
- Helper functions (
countEndpointsByVendor,getEffectiveIntervalMs,filterDueEndpoints) are well-named and focused
- Helper functions (
Automated review by Claude AI
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unit/settings/providers/provider-vendor-view-circuit-ui.test.tsx (1)
32-47: Mock 数据字段名与实际类型不匹配Mock 数据中使用了
lastOk和lastLatencyMs(第 42-43 行),但根据src/types/provider.ts中ProviderEndpoint的定义,正确的字段名应为lastProbeOk和lastProbeLatencyMs。虽然当前测试可能仍能通过,但这可能导致类型检查失败或未来的测试脆弱性。🐛 建议修复字段名
getProviderEndpointsByVendor: vi.fn(async () => [ { id: 1, vendorId: 1, providerType: "claude", url: "https://api.example.com/v1", label: null, sortOrder: 0, isEnabled: true, lastProbedAt: null, - lastOk: null, - lastLatencyMs: null, + lastProbeOk: null, + lastProbeLatencyMs: null, createdAt: "2026-01-01", updatedAt: "2026-01-01", }, ]),src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx (1)
479-485:formatDistanceToNow缺少 locale 配置,影响国际化。根据编码规范要求支持 5 种语言(zh-CN, zh-TW, en, ja, ru),
formatDistanceToNow需要传入对应的date-fnslocale 才能正确显示本地化的时间文本。当前实现会始终显示英文。建议添加 locale 支持
+import { useLocale } from "next-intl"; +import { zhCN, zhTW, enUS, ja, ru } from "date-fns/locale"; +const localeMap = { "zh-CN": zhCN, "zh-TW": zhTW, en: enUS, ja: ja, ru: ru }; function EndpointRow({ ... }) { + const locale = useLocale(); + const dateLocale = localeMap[locale as keyof typeof localeMap] || enUS; ... {endpoint.lastProbedAt ? ( <span className="text-muted-foreground text-[10px] whitespace-nowrap"> - {formatDistanceToNow(new Date(endpoint.lastProbedAt), { addSuffix: true })} + {formatDistanceToNow(new Date(endpoint.lastProbedAt), { addSuffix: true, locale: dateLocale })} </span> ) : (
🧹 Nitpick comments (6)
src/actions/provider-endpoints.ts (1)
198-198: 建议:提取验证 schema 以保持一致性第 198 行内联创建了 schema,但文件中已定义了
VendorIdSchema(第 52 行)。可以考虑提取为命名常量以保持与其他函数(如GetProviderEndpointsSchema)的一致性。♻️ 建议的重构
+const GetProviderEndpointsByVendorSchema = z.object({ + vendorId: VendorIdSchema, +}); + export async function getProviderEndpointsByVendor(input: { vendorId: number; }): Promise<ProviderEndpoint[]> { try { const session = await getAdminSession(); if (!session) { return []; } - const parsed = z.object({ vendorId: VendorIdSchema }).safeParse(input); + const parsed = GetProviderEndpointsByVendorSchema.safeParse(input); if (!parsed.success) {src/lib/provider-endpoints/probe-scheduler.ts (2)
26-31: 建议:考虑将硬编码的间隔常量改为可配置
SINGLE_VENDOR_INTERVAL_MS(10 分钟) 和TIMEOUT_OVERRIDE_INTERVAL_MS(10 秒) 目前是硬编码的,而BASE_INTERVAL_MS可通过环境变量配置。为了运维灵活性,建议将这些常量也改为可配置。♻️ 建议的重构
// Base interval (default 60s) const BASE_INTERVAL_MS = Math.max( 1, parseIntWithDefault(process.env.ENDPOINT_PROBE_INTERVAL_MS, 60_000) ); // Single-vendor interval (10 minutes) -const SINGLE_VENDOR_INTERVAL_MS = 600_000; +const SINGLE_VENDOR_INTERVAL_MS = Math.max( + 1, + parseIntWithDefault(process.env.ENDPOINT_PROBE_SINGLE_VENDOR_INTERVAL_MS, 600_000) +); // Timeout override interval (10 seconds) -const TIMEOUT_OVERRIDE_INTERVAL_MS = 10_000; +const TIMEOUT_OVERRIDE_INTERVAL_MS = Math.max( + 1, + parseIntWithDefault(process.env.ENDPOINT_PROBE_TIMEOUT_OVERRIDE_INTERVAL_MS, 10_000) +);同时在
.env.example中添加对应文档。
30-31: TICK_INTERVAL_MS 导致频繁的调度器 tick 和数据库查询当前实现中,
TICK_INTERVAL_MS = Math.min(BASE_INTERVAL_MS, TIMEOUT_OVERRIDE_INTERVAL_MS)= 10 秒(默认配置)。每次 tick 时,以下操作都会执行:
- 获取 leader lock(via
ensureLeaderLock())- 查询所有启用的端点(via
findEnabledProviderEndpointsForProbing())- 计算 vendor 端点数量(via
countEndpointsByVendor())- 过滤到期端点(via
filterDueEndpoints())虽然代码有早期返回(如果没有待探测端点则返回、如果前一个周期仍在运行则返回),但上述数据库和计算操作每 10 秒仍会至少尝试一次。在端点数量较大的情况下,这可能增加数据库和计算负载。
src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx (3)
308-321: 变量名t遮蔽了外层的翻译函数。Line 311 中
map((t, i) => ...)的参数t遮蔽了 Line 297 声明的useTranslations返回值t。虽然当前不会导致运行时错误,但如果后续在回调中需要使用翻译函数,会造成混淆和潜在 bug。建议重命名变量
// Sort endpoints by type order (from getAllProviderTypes) then by sortOrder const endpoints = useMemo(() => { const typeOrder = getAllProviderTypes(); - const typeIndexMap = new Map(typeOrder.map((t, i) => [t, i])); + const typeIndexMap = new Map(typeOrder.map((type, i) => [type, i])); return [...rawEndpoints].sort((a, b) => {
540-543:selectableTypes硬编码可能导致维护问题。当前直接硬编码了可选类型列表。如果将来
ProviderType添加新类型,这里需要手动同步更新。建议从getAllProviderTypes()过滤派生,或在provider-type-utils中定义可选类型配置。建议从配置派生
- const [providerType, setProviderType] = useState<ProviderType>("claude"); - - // Get provider types for the selector (exclude claude-auth and gemini-cli which are internal) - const selectableTypes: ProviderType[] = ["claude", "codex", "gemini", "openai-compatible"]; + const [providerType, setProviderType] = useState<ProviderType>("claude"); + + // Get provider types for the selector (exclude internal types) + const selectableTypes = useMemo( + () => getAllProviderTypes().filter((t) => !["claude-auth", "gemini-cli"].includes(t)), + [] + );或者在
provider-type-utils.ts中添加getSelectableProviderTypes()函数,将内部类型的判断逻辑集中管理。
383-386: Query key 不一致可能导致缓存失效问题。
EndpointsTable使用["provider-endpoints", vendorId]作为查询键(Line 301),但EndpointRow中的 mutation 使用["provider-endpoints"]进行失效(Lines 385, 408, 429)。虽然 React Query 的invalidateQueries支持前缀匹配,但建议保持一致性以提高可维护性。同样的问题也出现在
EditEndpointDialog(Line 680)。建议统一 query key 失效策略
可以选择以下方案之一:
方案 A: 将
vendorId传递给EndpointRow,使用精确的 query key:function EndpointRow({ endpoint, tTypes, + vendorId, }: { endpoint: ProviderEndpoint; tTypes: ReturnType<typeof useTranslations>; + vendorId: number; }) { ... onSuccess: () => { - queryClient.invalidateQueries({ queryKey: ["provider-endpoints"] }); + queryClient.invalidateQueries({ queryKey: ["provider-endpoints", vendorId] });方案 B: 全局失效所有 provider-endpoints 查询(当前行为,但需确认 React Query 版本支持前缀匹配)。
Address code review feedback: use getAllProviderTypes().filter() instead of hardcoded array to ensure automatic sync when new provider types are added. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Implement dynamic per-endpoint probe intervals with three-tier priority system to optimize probe frequency based on endpoint context:
ENDPOINT_PROBE_INTERVAL_MS)Additionally, simplify the provider vendor UI by removing type tabs and showing all endpoints in a unified table with type icons.
Problem
Related Issues:
Related PRs:
The original probe scheduler used a fixed 10s interval for all endpoints regardless of context. This caused:
Solution
Implement a dynamic interval calculation system with priority-based rules:
lastProbeErrorType === "timeout"ANDlastProbeOk \!== trueThe scheduler now uses a tick-based approach where it runs at the shortest interval (10s) but only probes endpoints that are "due" based on their effective interval.
Changes
Core Changes
src/lib/provider-endpoints/probe-scheduler.ts: Dynamic interval calculation withgetEffectiveIntervalMs()andfilterDueEndpoints()functionssrc/repository/provider-endpoints.ts: ExtendedProviderEndpointProbeTargettype to includevendorIdandlastProbeErrorType; addedfindProviderEndpointsByVendor()functionUI Changes
src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx:Supporting Changes
src/actions/provider-endpoints.ts: AddgetProviderEndpointsByVendor()actionsrc/repository/index.ts: Export new type and function.env.example: Update default interval to 60s with documentation for dynamic rulesaddEndpointDescGenericandcolumnTypekeys for all 5 languagesBreaking Changes
ENDPOINT_PROBE_INTERVAL_MSchanged from 10s to 60sENDPOINT_PROBE_INTERVAL_MS=10000to restore old behaviorgetEndpointProbeSchedulerStatus()return type changedbaseIntervalMs,singleVendorIntervalMs,timeoutOverrideIntervalMs,tickIntervalMs)Testing
Automated Tests
Test Coverage for Dynamic Intervals
Checklist
Description enhanced by Claude AI
Greptile Overview
Greptile Summary
This PR implements a well-designed dynamic probe interval system that intelligently adjusts endpoint health check frequency based on context. The implementation uses a three-tier priority system: timeout errors trigger 10s fast recovery checks, single-endpoint vendors get 10min reduced frequency (no failover benefit), and multi-endpoint vendors use the configurable base interval (default 60s).
Key improvements:
The UI changes simplify the provider vendor interface by removing type tabs and presenting all endpoints in a unified table with type icons, making it easier to view and manage endpoints across different provider types. The change is backwards-compatible with clear migration guidance in the PR description.
Confidence Score: 5/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Scheduler as Probe Scheduler participant DB as Database participant Lock as Leader Lock participant Probe as Probe Function Note over Scheduler: Timer triggers every TICK_INTERVAL_MS (10s) Scheduler->>Lock: Acquire/renew leader lock alt Not leader Lock-->>Scheduler: Lock failed Note over Scheduler: Exit early (another instance is leader) end Scheduler->>DB: findEnabledProviderEndpointsForProbing() DB-->>Scheduler: Return all enabled endpoints with vendorId, lastProbeErrorType Note over Scheduler: countEndpointsByVendor() Note over Scheduler: Calculate vendor endpoint counts loop For each endpoint alt Never probed (lastProbedAt === null) Note over Scheduler: Always due for probing else Has timeout error (lastProbeErrorType === "timeout" && lastProbeOk !== true) Note over Scheduler: Use 10s timeout override interval else Single-endpoint vendor (count === 1) Note over Scheduler: Use 10min single-vendor interval else Multi-endpoint vendor Note over Scheduler: Use 60s base interval end Note over Scheduler: Check if (now >= lastProbedAt + effectiveInterval) end Note over Scheduler: Filter to due endpoints only alt No due endpoints Scheduler-->>Scheduler: Exit early end Note over Scheduler: Shuffle due endpoints par Concurrent probing (up to CONCURRENCY workers) Scheduler->>Probe: probeProviderEndpointAndRecordByEndpoint() Probe->>DB: Update endpoint probe results DB-->>Probe: Success Probe-->>Scheduler: Complete and Worker 2 Scheduler->>Probe: probeProviderEndpointAndRecordByEndpoint() Probe->>DB: Update endpoint probe results and Worker N Scheduler->>Probe: probeProviderEndpointAndRecordByEndpoint() Probe->>DB: Update endpoint probe results end Note over Scheduler: Wait 10s until next tick