Skip to content

fix(provider): wrap endpoint sync in transactions to prevent race conditions#730

Merged
ding113 merged 3 commits intodevfrom
fix/provider-endpoint-sync-transaction
Feb 7, 2026
Merged

fix(provider): wrap endpoint sync in transactions to prevent race conditions#730
ding113 merged 3 commits intodevfrom
fix/provider-endpoint-sync-transaction

Conversation

@ding113
Copy link
Owner

@ding113 ding113 commented Feb 7, 2026

Summary

This PR wraps provider create/update operations in database transactions to prevent race conditions that could leave orphaned or inconsistent endpoint rows in the provider_endpoints table.

Problem

Related Issues:

Root Cause:

When editing a provider's URL or type, the system previously executed these operations sequentially without transaction protection:

  1. Resolve vendor from new URL
  2. Update provider record
  3. Ensure endpoint exists for new URL

This created race conditions where:

  • Concurrent requests could insert duplicate endpoints
  • Provider updates could succeed while endpoint sync failed, leaving inconsistent state
  • Old endpoint records were never cleaned up, accumulating indefinitely in the database
  • The proxy could route to stale endpoints that should have been deleted

Solution

Core Changes

1. Transaction-wrapped provider operations (src/repository/provider.ts)

  • createProvider() now wraps vendor resolution + provider insert + endpoint seeding in a single transaction
  • updateProvider() now wraps vendor resolution + provider update + endpoint sync in a single transaction
  • Ensures atomicity: either all operations succeed or all roll back

2. New atomic endpoint sync function (src/repository/provider-endpoints.ts)

  • Added syncProviderEndpointOnProviderEdit() for intelligent endpoint migration when provider URL/type changes
  • Handles multiple scenarios:
    • In-place update: Same vendor+type, just URL changed → update existing endpoint
    • Soft-delete + create: Different URL → soft-delete old endpoint, create new one
    • Revive: New URL matches a previously soft-deleted endpoint → revive it
    • Conflict handling: Uses savepoints to handle unique constraint violations gracefully
  • Resets circuit breaker state when endpoints are revived or updated

3. Transaction-aware helper functions

  • getOrCreateProviderVendorIdFromUrls() now accepts optional { tx } parameter
  • ensureProviderEndpointExistsForUrl() now accepts optional { tx } parameter
  • tryDeleteProviderVendorIfEmpty() now accepts optional { tx } parameter

4. Graceful vendor cleanup degradation

  • Vendor cleanup failures now log warnings instead of propagating exceptions
  • Prevents cleanup errors from blocking primary operations

Supporting Changes

Type safety improvements:

  • Added QueryExecutor type for transaction-compatible database operations
  • Added isUniqueViolationError() helper for detecting PostgreSQL unique constraint violations

Test coverage:

  • Unit tests for syncProviderEndpointOnProviderEdit covering all edge cases (noop, in-place update, soft-delete + create, revive, conflict)
  • Unit tests for createProvider transaction atomicity
  • Integration test for concurrent provider endpoint sync race conditions
  • Updated existing provider/endpoint tests for transaction compatibility
  • Updated rate limit test mocks for timezone resolution

Changes by File

Core Logic

  • src/repository/provider.ts (+231/-189) - Wrap create/update in transactions
  • src/repository/provider-endpoints.ts (+454/-62) - Add syncProviderEndpointOnProviderEdit, transaction support

Actions Layer

  • src/actions/providers.ts (+24/-6) - Graceful vendor cleanup error handling
  • src/actions/provider-endpoints.ts (+9/-1) - Graceful vendor cleanup error handling

Tests

  • tests/integration/provider-endpoint-sync-race.test.ts (+147) - Race condition integration test
  • tests/unit/repository/provider-create-transaction.test.ts (+187) - Transaction atomicity tests
  • tests/unit/repository/provider-endpoint-sync-helper.test.ts (+227) - Sync function edge cases
  • tests/unit/repository/provider-endpoint-sync-on-edit.test.ts (+219) - Edit scenarios
  • tests/unit/repository/provider-endpoints.test.ts (+49/-20) - Updated for transactions
  • tests/unit/actions/providers.test.ts (+59) - Action layer endpoint sync tests
  • tests/unit/actions/providers-recluster.test.ts (+15/-7) - Recluster transaction support
  • tests/unit/lib/rate-limit/*.test.ts (+18) - Mock updates for timezone resolution

Configuration

  • vitest.integration.config.ts (+1) - Integration test config

Breaking Changes

None. All changes are backward compatible.

Test Plan

Automated Tests

  • Unit tests for syncProviderEndpointOnProviderEdit covering all edge cases (noop, in-place update, soft-delete + create, revive, conflict)
  • Unit tests for createProvider transaction atomicity
  • Integration test for concurrent provider endpoint sync race conditions
  • Existing provider/endpoint tests still pass
  • Rate limit tests unaffected by mock updates

Manual Testing (if desired)

  1. Create a provider with URL https://api-a.example.com/v1/messages
  2. Verify endpoint is created in provider_endpoints table
  3. Edit provider URL to https://api-b.example.com/v1/messages
  4. Verify old endpoint is soft-deleted and new endpoint is created
  5. Edit provider URL back to https://api-a.example.com/v1/messages
  6. Verify old endpoint is revived (not duplicated)
  7. Check that no orphaned endpoints remain in the database

Checklist

  • Code follows project conventions
  • Self-review completed
  • Tests pass locally
  • Documentation updated (if needed)

Description enhanced by Claude AI

Greptile Overview

Greptile Summary

  • Wraps provider create/update flows in a single DB transaction that also resolves/creates provider_vendors and seeds/syncs provider_endpoints, aiming to eliminate races and partial writes.
  • Adds syncProviderEndpointOnProviderEdit() to atomically migrate endpoints across URL/vendor/type changes (create/revive/update-in-place/soft-delete paths) with unique-conflict handling.
  • Makes vendor/endpoint helper APIs transaction-aware via optional { tx } executors, and downgrades vendor cleanup failures to warnings in actions/repository.
  • Extends unit/integration coverage around endpoint sync behavior (including a concurrent insert race) and updates integration vitest config to include the new test.

Confidence Score: 3/5

  • This PR is close, but there are a few correctness gaps under bad/legacy data and concurrent edits that should be addressed before merging.
  • Core transaction wrapping and endpoint sync logic look directionally correct and well-tested, but (1) endpoint sync doesn’t validate/handle invalid previousUrl inputs, (2) in-place move updates don’t verify affected rows, and (3) provider update’s pre-read isn’t locked, so concurrent edits can still derive migrations from stale state.
  • src/repository/provider-endpoints.ts, src/repository/provider.ts

Important Files Changed

Filename Overview
src/actions/provider-endpoints.ts Wrap vendor cleanup in try/catch + warn logging after endpoint removal; no behavior change to primary delete path.
src/actions/providers.ts Adds guarded vendor cleanup (warn-only on failure) and passes tx into vendor resolution during recluster; main provider action flows unchanged.
src/repository/provider-endpoints.ts Introduces transaction-aware vendor/endpoint helpers and new syncProviderEndpointOnProviderEdit() for endpoint migration; review findings include silent date fallback, missing previousUrl validation, and no rowcount check on in-place move update.
src/repository/provider.ts Wraps create/update provider flows in transactions and integrates endpoint sync; review finding: pre-read of provider state isn’t locked, so concurrent edits can compute sync from stale state.
tests/integration/provider-endpoint-sync-race.test.ts Adds integration test that simulates a lock + concurrent endpoint insert during provider URL edit; verifies transaction doesn’t break and endpoints end up consistent.
tests/unit/actions/providers-recluster.test.ts Updates recluster tests to account for transaction-aware vendor key computation and vendorId resolution with { tx }.
tests/unit/actions/providers.test.ts Extends provider action tests around async behavior and ensures url/provider_type and website_url edits are forwarded correctly; includes favicon generation cases.
tests/unit/lib/rate-limit/cost-limits.test.ts Adjusts timezone mocking for deterministic fixed/rolling reset calculations; no production behavior changes.
tests/unit/lib/rate-limit/rolling-window-cache-warm.test.ts Updates timezone mocks while validating rolling-window cache warm behavior; no production behavior changes.
tests/unit/lib/rate-limit/service-extra.test.ts Updates timezone mocks and expands quota-path assertions; no production behavior changes.
tests/unit/repository/provider-create-transaction.test.ts Adds unit tests asserting createProvider runs vendor resolve + insert + endpoint seed within one transaction and bubbles seed errors.
tests/unit/repository/provider-endpoint-sync-helper.test.ts Adds unit coverage for syncProviderEndpointOnProviderEdit edge cases including conflicts and circuit reset deferral behavior.
tests/unit/repository/provider-endpoint-sync-on-edit.test.ts Adds tests asserting updateProvider uses the new endpoint sync helper, resets circuit post-commit, and doesn’t block on vendor cleanup failures.
tests/unit/repository/provider-endpoints.test.ts Updates repository helper tests for new validation and transaction-aware semantics; focuses on ensure/backfill/vendor cleanup behaviors.
vitest.integration.config.ts Includes new provider-endpoint race integration test in the integration vitest config.

Sequence Diagram

sequenceDiagram
  autonumber
  participant UI as Vendor Settings UI
  participant A as actions/providers.ts
  participant PR as repository/provider.ts
  participant DB as Postgres (drizzle)
  participant PE as repository/provider-endpoints.ts
  participant CB as endpoint-circuit-breaker

  UI->>A: editProvider(id, {url/type/websiteUrl})
  A->>PR: updateProvider(id, data)
  PR->>DB: transaction(begin)
  alt needs vendor refresh
    PR->>PE: getOrCreateProviderVendorIdFromUrls(..., {tx})
    PE->>DB: select/insert provider_vendors
  end
  PR->>DB: update providers set ... returning provider
  alt shouldSyncEndpoint
    PR->>PE: syncProviderEndpointOnProviderEdit(prev,next,..., {tx})
    PE->>DB: select previous/next endpoints
    alt in-place move possible
      PE->>DB: update provider_endpoints (move + clear probe snapshot)
      PE-->>PR: {action, resetCircuitEndpointId}
    else create/revive next
      PE->>DB: insert on conflict do nothing
      PE->>DB: update deletedAt/isEnabled if revive
      PE-->>PR: {action}
    end
  end
  PR->>DB: transaction(commit)
  opt resetCircuitEndpointId
    PR->>CB: resetEndpointCircuit(endpointId)
  end
  opt previousVendorIdToCleanup
    PR->>PE: tryDeleteProviderVendorIfEmpty(vendorId)
    PE->>DB: transaction(begin)
    PE->>DB: check active providers/endpoints
    PE->>DB: delete vendor if safe
    PE->>DB: transaction(commit)
  end
  PR-->>A: updated provider
  A-->>UI: success response
Loading

Provider create and update operations now run vendor resolution and
endpoint sync inside database transactions to prevent race conditions
that could leave orphaned or inconsistent endpoint rows.

Key changes:
- createProvider: wrap vendor + insert + endpoint seed in a single tx
- updateProvider: wrap vendor + update + endpoint sync in a single tx
- Add syncProviderEndpointOnProviderEdit for atomic URL/type/vendor
  migration with in-place update, soft-delete, and conflict handling
- Vendor cleanup failures degrade to warnings instead of propagating
- Add comprehensive unit and integration tests for sync edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

📝 Walkthrough

Walkthrough

引入事务化的端点同步(syncProviderEndpointOnProviderEdit)、为若干仓库方法添加可选事务上下文({ tx }),在提供商创建/更新/删除流程中将关键步骤移入事务并增强对供应商清理与电路重置的容错日志处理,同时扩展大量单元与集成测试覆盖并发与冲突场景。

Changes

Cohort / File(s) Summary
仓库:提供商端点与类型
src/repository/provider-endpoints.ts
新增事务执行类型与唯一约束判定工具,导出并实现 syncProviderEndpointOnProviderEdit(复杂事务化同步流程),并让 getOrCreateProviderVendorIdFromUrlsensureProviderEndpointExistsForUrl 接受可选 { tx }
仓库:提供商操作
src/repository/provider.ts
将 create/update 流程移入事务,内部通过 { tx } 解析/创建 providerVendorId 并在事务内种子/同步端点;update 返回结构化结果(含 previousVendorIdToCleanup、endpointCircuitResetId),事务外进行容错的电路重置与旧供应商清理。
动作层(actions)
src/actions/provider-endpoints.ts, src/actions/providers.ts
在删除/重分配流程中将供应商清理与端点电路重置包裹在 try/catch,记录警告并继续主流程;recluster 路径中向 getOrCreateProviderVendorIdFromUrls 传入 { tx }
测试:端点同步与并发(集成/单元)
tests/integration/provider-endpoint-sync-race.test.ts, tests/unit/repository/provider-endpoint-sync-helper.test.ts, tests/unit/repository/provider-endpoint-sync-on-edit.test.ts, tests/unit/repository/provider-endpoints.test.ts
新增/扩展用例覆盖 syncProviderEndpointOnProviderEdit 的并发冲突、唯一约束冲突、回退策略、输入校验与异常路径;新增集成测试验证编辑并发锁定情形。
测试:创建事务与动作层
tests/unit/repository/provider-create-transaction.test.ts, tests/unit/actions/providers-recluster.test.ts, tests/unit/actions/providers.test.ts
新增/调整测试以断言事务上下文(tx)被传递并使用,验证端点种子在事务内执行,以及编辑时 favicon/URL 行为。
测试:时区相关 mock 修正
tests/unit/lib/rate-limit/*.test.ts
为依赖时区解析的测试注入确定性 mock(resolveSystemTimezone 返回 "Asia/Shanghai")。
测试配置
vitest.integration.config.ts
将新集成测试 tests/integration/provider-endpoint-sync-race.test.ts 添加到集成测试包含列表。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确描述了主要变更:将端点同步操作包装在事务中以防止竞态条件,这与PR的核心目标和实现完全相符。
Description check ✅ Passed 描述详细说明了问题、根本原因、解决方案和文件变更,与PR的实现内容高度相关,包括具体的代码路径和行为变化。
Linked Issues check ✅ Passed PR实现了#722的需求:在编辑provider时,对provider_endpoints表进行原子性操作(更新/软删除+创建/复活),避免重复累积,并通过事务防止竞态条件。
Out of Scope Changes check ✅ Passed 所有变更都与修复#722相关(端点同步事务化):核心逻辑、测试、配置更新都在范围内,无与目标无关的改动。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/provider-endpoint-sync-transaction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the data integrity and robustness of provider and provider endpoint management. By introducing database transactions for critical create and update flows, it eliminates potential race conditions and ensures that related data (providers, vendors, and endpoints) remains consistent. The new, sophisticated endpoint synchronization logic (syncProviderEndpointOnProviderEdit) intelligently handles various scenarios during provider modifications, such as URL changes or type migrations, ensuring that endpoints are correctly updated, soft-deleted, or created as needed. Additionally, error handling for vendor cleanup has been improved to prevent non-critical failures from disrupting core application functionality.

Highlights

  • Transactional Integrity for Provider Operations: Provider creation and update operations (createProvider, updateProvider) now wrap vendor resolution and endpoint synchronization within a single database transaction. This ensures atomicity, preventing race conditions that could lead to orphaned or inconsistent endpoint records.
  • Robust Endpoint Synchronization Logic: A new function, syncProviderEndpointOnProviderEdit, has been introduced to handle complex endpoint lifecycle management during provider edits. It supports in-place URL/type/vendor migration, soft-deletion of old endpoints, revival of existing ones, and intelligent conflict resolution using savepoints.
  • Improved Vendor Cleanup Error Handling: Failures during automatic vendor cleanup (tryDeleteProviderVendorIfEmpty) are now gracefully handled by logging warnings instead of propagating exceptions. This prevents cleanup issues from blocking primary operations like provider removal or re-clustering.
  • Enhanced Endpoint URL Validation: The ensureProviderEndpointExistsForUrl function now throws explicit errors for invalid or empty URLs, providing clearer feedback and preventing malformed data from entering the system.
Changelog
  • src/actions/provider-endpoints.ts
    • Wrapped tryDeleteProviderVendorIfEmpty calls in removeProviderEndpoint with a try-catch block to log warnings on failure instead of throwing.
  • src/actions/providers.ts
    • Wrapped tryDeleteProviderVendorIfEmpty calls in removeProvider with a try-catch block to log warnings on failure instead of throwing.
    • Modified reclusterProviderVendors to pass the transaction object (tx) to getOrCreateProviderVendorIdFromUrls and tryDeleteProviderVendorIfEmpty for transactional consistency.
    • Wrapped tryDeleteProviderVendorIfEmpty calls in reclusterProviderVendors with a try-catch block to log warnings on failure.
  • src/repository/provider-endpoints.ts
    • Imported resetEndpointCircuit for clearing circuit breaker state and ne (not equal) from drizzle-orm.
    • Added isUniqueViolationError helper function to identify unique constraint violations.
    • Modified getOrCreateProviderVendorIdFromUrls to accept an optional tx (QueryExecutor) for transactional execution.
    • Refactored tryDeleteProviderVendorIfEmpty to accept an optional tx and execute within a transaction if tx is not provided, or use the provided tx. Removed internal try-catch as error handling is now external.
    • Modified ensureProviderEndpointExistsForUrl to accept an optional tx and use it. Changed validation for invalid URLs to throw errors instead of returning false.
    • Introduced syncProviderEndpointOnProviderEdit function to atomically manage provider endpoint changes (create, update, soft-delete, revive) during provider edits, including conflict resolution and circuit breaker resets.
  • src/repository/provider.ts
    • Refactored createProvider to execute vendor resolution, provider insertion, and endpoint seeding within a single database transaction, ensuring atomicity.
    • Refactored updateProvider to wrap the entire update process in a database transaction, including fetching current provider data, updating vendor information, and synchronizing endpoints.
    • Integrated syncProviderEndpointOnProviderEdit into updateProvider to handle endpoint changes based on URL or provider type modifications.
    • Implemented external try-catch for tryDeleteProviderVendorIfEmpty calls in updateProvider to log cleanup failures without blocking the main update.
  • tests/integration/provider-endpoint-sync-race.test.ts
    • Added a new integration test to verify that concurrent next-url inserts do not break provider edit transactions, specifically targeting race conditions.
  • tests/unit/actions/providers-recluster.test.ts
    • Updated unit tests to reflect the change in reclusterProviderVendors where getOrCreateProviderVendorIdFromUrls now receives a transaction object.
  • tests/unit/actions/providers.test.ts
    • Added unit tests for editProvider to confirm correct forwarding of URL/provider_type edits to the repository.
    • Added unit tests for editProvider to verify automatic favicon URL generation when website_url is updated and clearing when website_url is null.
  • tests/unit/lib/rate-limit/cost-limits.test.ts
    • Updated mocking for resolveSystemTimezone to ensure consistent test environments.
  • tests/unit/lib/rate-limit/rolling-window-cache-warm.test.ts
    • Updated mocking for resolveSystemTimezone to ensure consistent test environments.
  • tests/unit/lib/rate-limit/service-extra.test.ts
    • Updated mocking for resolveSystemTimezone to ensure consistent test environments.
  • tests/unit/repository/provider-create-transaction.test.ts
    • Added new unit tests to verify the transactional atomicity of createProvider, ensuring vendor resolution, provider insertion, and endpoint seeding occur within a single transaction.
  • tests/unit/repository/provider-endpoint-sync-helper.test.ts
    • Added new unit tests for syncProviderEndpointOnProviderEdit, covering scenarios like invalid URLs, no-op conditions, in-place URL moves (including clearing stale probe data), handling concurrent insert conflicts, and fallback behavior for unique constraint violations during in-place updates.
  • tests/unit/repository/provider-endpoint-sync-on-edit.test.ts
    • Added new unit tests for updateProvider's endpoint synchronization logic, including scenarios for URL changes, vendor changes, and error handling during endpoint sync and vendor cleanup.
  • tests/unit/repository/provider-endpoints.test.ts
    • Updated unit tests for ensureProviderEndpointExistsForUrl to expect errors for invalid URLs instead of false.
    • Updated unit tests for tryDeleteProviderVendorIfEmpty to expect exceptions when the transaction fails, aligning with the new error handling.
    • Added a unit test for ensureProviderEndpointExistsForUrl to confirm it maintains insert-only semantics when not provided with a transaction context.
  • vitest.integration.config.ts
    • Added tests/integration/provider-endpoint-sync-race.test.ts to the integration test suite.
Activity
  • The author, ding113, has implemented the changes described in the pull request.
  • A comprehensive test plan has been provided, including unit tests for new synchronization logic, transaction atomicity, and an integration test for concurrent race conditions.
  • New unit and integration tests have been added to cover the introduced transactional behavior and endpoint synchronization logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working area:provider size/XL Extra Large PR (> 1000 lines) labels Feb 7, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +894 to +897
};

const ensureNextEndpointActive = async (options?: {
reactivateDisabled?: boolean;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circuit reset in tx

syncProviderEndpointOnProviderEdit calls resetEndpointCircuit(previousEndpoint.id) inside the DB transaction after an in-place move. If the outer transaction later rolls back (e.g. provider update fails), the circuit state will already be reset even though the endpoint row change didn’t commit.

Move the circuit reset outside the transaction boundary (or gate it on commit) to keep circuit state consistent with persisted endpoint state.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 894:897

Comment:
**Circuit reset in tx**

`syncProviderEndpointOnProviderEdit` calls `resetEndpointCircuit(previousEndpoint.id)` inside the DB transaction after an in-place move. If the outer transaction later rolls back (e.g. provider update fails), the circuit state will already be reset even though the endpoint row change didn’t commit.

Move the circuit reset outside the transaction boundary (or gate it on commit) to keep circuit state consistent with persisted endpoint state.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Feb 7, 2026

Additional Comments (1)

src/repository/provider-endpoints.ts
Inconsistent error contract

ensureProviderEndpointExistsForUrl used to return false for empty/invalid URLs, but now it throws (e.g. throw new Error("[ProviderEndpointEnsure] url is required")). Any existing callers that rely on a boolean return will now hard-fail and potentially abort their surrounding flow/transaction.

Please either restore the previous boolean behavior or update all call sites to handle the thrown errors explicitly and adjust tests accordingly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 744:746

Comment:
**Inconsistent error contract**

`ensureProviderEndpointExistsForUrl` used to return `false` for empty/invalid URLs, but now it throws (e.g. `throw new Error("[ProviderEndpointEnsure] url is required")`). Any existing callers that rely on a boolean return will now hard-fail and potentially abort their surrounding flow/transaction.

Please either restore the previous boolean behavior or update *all* call sites to handle the thrown errors explicitly and adjust tests accordingly.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2026

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors provider and provider endpoint management to enhance data consistency and error handling. It introduces try...catch blocks around tryDeleteProviderVendorIfEmpty calls in removeProviderEndpoint, removeProvider, and reclusterProviderVendors to log cleanup failures without blocking main operations. A new syncProviderEndpointOnProviderEdit function is added to src/repository/provider-endpoints.ts, which intelligently manages provider endpoints when a provider's URL or type changes, handling in-place updates, creation, revival, or soft-deletion of endpoints, including specific logic for unique constraint violations and clearing circuit breaker states. The createProvider and updateProvider functions are updated to be fully transactional, ensuring that vendor creation, provider updates, and endpoint synchronization occur atomically. The review comment highlights a type safety concern in syncProviderEndpointOnProviderEdit regarding the QueryExecutor type, suggesting that the transaction method should be explicitly included in the type definition to avoid a hacky cast and ensure proper handling of nested transactions.

Comment on lines +1045 to +1065
const executorWithSavepoint = tx as QueryExecutor & {
transaction?: <T>(runInTx: (nestedTx: TransactionExecutor) => Promise<T>) => Promise<T>;
};

if (typeof executorWithSavepoint.transaction === "function") {
try {
await executorWithSavepoint.transaction(async (nestedTx) => {
await updatePreviousEndpointInPlace(nestedTx);
});
movedInPlace = true;
} catch (error) {
if (!isUniqueViolationError(error)) {
throw error;
}
}
} else {
// No savepoint support means we cannot safely continue after unique violations.
await updatePreviousEndpointInPlace(tx);
movedInPlace = true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling unique violations during in-place updates relies on executorWithSavepoint.transaction being a function to create a savepoint. While Drizzle's transaction object (tx) typically supports nested transactions in PostgreSQL, the QueryExecutor type is defined as a Pick that excludes the transaction method. If options.tx is provided by a caller using the QueryExecutor type, this check might fail or behave unexpectedly. Since syncProviderEndpointOnProviderEdit is always wrapped in a transaction at the top level if options.tx is missing, tx will always be a transaction object. It would be cleaner to include transaction in the QueryExecutor type definition to avoid the hacky cast and ensure type safety for nested transactions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/unit/repository/provider-endpoint-sync-on-edit.test.ts`:
- Around line 116-131: 在测试文件中将对模块的动态替换与导入改为使用项目路径别名而不是相对路径;具体地,将所有
vi.doMock("../../../src/…") 调用和后续 await import("../../../src/…") 改为使用 "@/…"
别名(例如替换 vi.doMock("../../../src/drizzle/db", () => ({ db
}))、vi.doMock("../../../src/repository/provider-endpoints", () => (…)) 以及 await
import("../../../src/repository/provider")),以保持与代码库的路径别名规范一致并避免重构时的路径断裂,同时保留原有的
mock 名称如
getOrCreateProviderVendorIdFromUrlsMock、ensureProviderEndpointExistsForUrlMock、tryDeleteProviderVendorIfEmptyMock
和 syncProviderEndpointOnProviderEditMock 不变。
🧹 Nitpick comments (5)
tests/unit/lib/rate-limit/rolling-window-cache-warm.test.ts (1)

34-36: 与其他测试文件的 mock 模式不一致。

cost-limits.test.tsservice-extra.test.ts 均使用 vi.hoisted 创建 mock 引用,并在 beforeEach 中(resetAllMocks 之后)重新设置返回值。此文件使用了内联 vi.fn,且 beforeEach 中仅调用了 clearAllMocks,没有重置 mock 实现。

当前 clearAllMocks 不会清除实现,因此功能上没有问题。但如果未来有人将其改为 resetAllMocks,时区 mock 会静默失效。建议统一为 vi.hoisted 模式以保持一致性。

建议的修改
+const resolveSystemTimezoneMock = vi.hoisted(() => vi.fn(async () => "Asia/Shanghai"));
+
 vi.mock("@/lib/utils/timezone", () => ({
-  resolveSystemTimezone: vi.fn(async () => "Asia/Shanghai"),
+  resolveSystemTimezone: resolveSystemTimezoneMock,
 }));

并在 beforeEach 中添加重置:

 beforeEach(() => {
   pipelineCommands.length = 0;
   vi.clearAllMocks();
+  resolveSystemTimezoneMock.mockResolvedValue("Asia/Shanghai");
   vi.useFakeTimers();
   vi.setSystemTime(new Date(nowMs));
 });
src/repository/provider-endpoints.ts (1)

1066-1069: 建议将 resetEndpointCircuit 失败降级,避免事务被外部副作用影响
该调用位于事务内,若 resetEndpointCircuit 依赖外部存储并抛错,会导致 endpoint 同步与 provider 更新整体回滚。可考虑捕获并降级为告警或移出事务。

建议修改
-          await resetEndpointCircuit(previousEndpoint.id);
+          try {
+            await resetEndpointCircuit(previousEndpoint.id);
+          } catch (error) {
+            logger.warn("syncProviderEndpointOnProviderEdit:reset_endpoint_circuit_failed", {
+              endpointId: previousEndpoint.id,
+              error: error instanceof Error ? error.message : String(error),
+            });
+          }
tests/unit/repository/provider-create-transaction.test.ts (1)

81-103: createDbMock 中的 tx 缺少 select 方法。

当前 tx mock 只有 insert,但 createProvider 内部调用的 getOrCreateProviderVendorIdFromUrlsensureProviderEndpointExistsForUrl 在事务内可能需要 select 操作。如果生产代码路径恰好只走了被 mock 的函数(这些函数本身也是 mock 的),那目前不会出错。但如果未来生产代码在事务内新增了直接的 tx.select 调用,测试会因缺少方法而失败。

当前由于 getOrCreateProviderVendorIdFromUrlsensureProviderEndpointExistsForUrl 都已被完全 mock,所以实际不会触发 tx.select,测试能通过。可以考虑在 tx 上补充 select mock 以增强健壮性。

tests/integration/provider-endpoint-sync-race.test.ts (1)

19-28: createDeferredresolve 变量的类型安全可以改进。

let resolve: () => void; 声明后未初始化,虽然 Promise executor 同步执行保证了赋值,但 TypeScript 无法静态验证这一点,需要在第 26 行使用 resolve! 非空断言。可以考虑初始化为空函数以消除断言。

建议的改进
 function createDeferred() {
-  let resolve: () => void;
+  let resolve: () => void = () => {};
   const promise = new Promise<void>((res) => {
     resolve = res;
   });
   return {
     promise,
-    resolve: resolve!,
+    resolve,
   };
 }
tests/unit/repository/provider-endpoint-sync-helper.test.ts (1)

188-226: 唯一约束冲突回退测试值得注意一个细节。

第 200 行 updateWhereMock.mockRejectedValueOnce(...) 会让第一次 where() 调用拒绝,但 updateSetMock(第 15-18 行)在调用 where 之前已将 payload 推入 updatePayloads。这意味着 updatePayloads[0] 包含了失败的 in-place move 的 payload,updatePayloads[1] 才是 soft-delete 的 payload。

当前断言只检查了 updatePayloads[1](第 220 行),逻辑上是正确的。建议补充对 updatePayloads[0] 的断言,验证 in-place move 尝试确实发生过,使测试意图更明确。

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

经过全面的多视角代码审查,本 PR 的实现质量很高,未发现需要修复的严重问题。

PR Size: XL

  • Lines changed: 1925 (1640 additions + 285 deletions)
  • Files changed: 15

建议: 此 PR 规模较大,建议在合并后密切监控生产环境的事务性能和数据库连接池使用情况。

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Excellent (7 new test files, integration + unit tests)
  • Code clarity - Good

Key Strengths

  1. 事务完整性

    • createProviderupdateProvider 正确包装在事务中
    • 确保 vendor 解析、provider 更新和 endpoint 同步的原子性
  2. 错误处理

    • 所有 tryDeleteProviderVendorIfEmpty 调用点都添加了 try-catch 包装
    • Vendor cleanup 失败不会阻塞主要操作(graceful degradation)
  3. 并发安全

    • 使用 onConflictDoNothing 处理并发插入
    • 使用 savepoint 处理 unique constraint violations
    • 集成测试验证了竞态条件场景
  4. 类型安全

    • 添加了 QueryExecutor 类型用于事务兼容的数据库操作
    • 移除了 any 类型使用(src/repository/provider.ts:881
    • 添加了 isUniqueViolationError 辅助函数
  5. 测试覆盖

    • 新增 7 个测试文件,覆盖所有边缘情况
    • 包含集成测试验证竞态条件
    • 单元测试覆盖事务原子性、endpoint 同步逻辑

Architecture Highlights

核心改进: syncProviderEndpointOnProviderEdit 函数实现了智能 endpoint 迁移:

  • In-place update: 相同 vendor+type,仅 URL 变更 → 更新现有 endpoint
  • Soft-delete + create: 不同 URL → soft-delete 旧 endpoint,创建新 endpoint
  • Revive: 新 URL 匹配已 soft-deleted 的 endpoint → 恢复它
  • Conflict handling: 使用 savepoint 优雅处理 unique constraint violations
  • Circuit breaker reset: endpoint URL 变更时重置熔断器状态

Automated review by Claude AI

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No significant issues identified in this PR.

PR Size: XL

  • Lines changed: 1925
  • Files changed: 15
  • Split suggestion: consider splitting into (1) repository transactional changes, (2) action-layer wiring, (3) tests/config

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Adequate
  • Code clarity - Good

Automated review by Codex AI

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Reviewed PR #730 (fix(provider): wrap endpoint sync in transactions to prevent race conditions).
  • PR size computed as XL (1640 additions + 285 deletions = 1925 lines; 15 files) and applied label size/XL.
  • Submitted a PR review summary comment via gh pr review --comment (no inline comments posted since no diff-scoped issues met the reporting threshold).

Avoid running endpoint circuit reset side effects inside DB transactions to prevent rollback inconsistency. Run resets only after commit and add regression tests for deferred reset behavior in helper and provider update flows.
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 7, 2026

Additional Comments (1)

src/repository/provider-endpoints.ts
Invalid URL now throws

ensureProviderEndpointExistsForUrl() changed from returning false on blank/invalid url to throwing (e.g. throw new Error("[ProviderEndpointEnsure] url is required")). Existing callers outside this PR that rely on the previous boolean contract will now crash instead of treating it as a no-op. Either restore the old return-false behavior (and let transaction-wrapped callers decide to throw), or update all call sites to handle these exceptions consistently.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 714:716

Comment:
**Invalid URL now throws**

`ensureProviderEndpointExistsForUrl()` changed from returning `false` on blank/invalid `url` to throwing (e.g. `throw new Error("[ProviderEndpointEnsure] url is required")`). Existing callers outside this PR that rely on the previous boolean contract will now crash instead of treating it as a no-op. Either restore the old return-`false` behavior (and let transaction-wrapped callers decide to throw), or update *all* call sites to handle these exceptions consistently.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/repository/provider-endpoints.ts`:
- Around line 1097-1104: The returned action incorrectly labels the outcome as
"kept-previous-and-created-next" when ensureNextEndpointActive() returns "noop"
(concurrent transaction already created an active next endpoint); update the
branching after const ensureResult = await ensureNextEndpointActive() to
explicitly handle ensureResult === "noop" and return a distinct action (e.g.,
"kept-previous-and-kept-next") instead of the "created-next" label, and also
update the ProviderEndpointSyncAction type to include the new action value so
callers and logs reflect the true outcome; ensure the code paths that use
previousEndpoint, nextEndpoint, ensureNextEndpointActive(), and the returned
action are adjusted accordingly.
🧹 Nitpick comments (3)
src/repository/provider-endpoints.ts (2)

21-25: QueryExecutor 缺少 query 方法,可能在部分场景下不够用。

当前 QueryExecutor 仅 Pick 了 select | insert | update | delete | execute。如果后续有调用者在 tx 上下文中需要使用 db.query(Drizzle 的 relational query API),类型上会报错。目前使用场景看起来够用,先记录一下。


1047-1067: savepoint 的类型探测方式较脆弱,建议添加注释说明。

通过 as QueryExecutor & { transaction?: ... } 进行类型断言,然后用 typeof ... === "function" 检查 savepoint 支持。这依赖于 Drizzle 事务对象的内部实现细节(嵌套 .transaction() 映射为 SAVEPOINT)。如果 Drizzle 后续版本改变了 API,此处会静默降级到 else 分支(不支持 savepoint → 唯一约束冲突直接抛出)。

当前方案可行,建议在注释中记录此依赖,方便后续升级时排查。

src/repository/provider.ts (1)

86-145: .returning() 的字段列表在 createProviderupdateProvider 中大量重复。

createProvider(Line 92-145)和 updateProvider(Line 535-588)的 .returning() 块几乎完全相同,且与 findProviderListfindAllProvidersFreshfindProviderById 中的 .select() 字段列表也高度重复。

建议抽取为共享常量(如 providerSelectColumns),减少维护负担和遗漏新增字段的风险。

提取共享字段列表
// 在文件顶部或 _shared 模块中定义
const providerColumns = {
  id: providers.id,
  name: providers.name,
  url: providers.url,
  key: providers.key,
  providerVendorId: providers.providerVendorId,
  // ... 其余字段
  createdAt: providers.createdAt,
  updatedAt: providers.updatedAt,
  deletedAt: providers.deletedAt,
};

// 使用时
const [provider] = await tx
  .insert(providers)
  .values({ ...dbData, providerVendorId })
  .returning(providerColumns);

…tion label

When ensureNextEndpointActive() returns "noop" (concurrent transaction
already created an active next endpoint), the action was incorrectly
labelled "kept-previous-and-created-next". Add a new
"kept-previous-and-kept-next" action to ProviderEndpointSyncAction and
use a three-way branch so callers and logs reflect the true outcome.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ding113 ding113 merged commit e99502e into dev Feb 7, 2026
7 of 9 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 7, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +827 to +829
export async function syncProviderEndpointOnProviderEdit(
input: SyncProviderEndpointOnProviderEditInput,
options?: { tx?: QueryExecutor }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous URL not validated

syncProviderEndpointOnProviderEdit() validates nextUrl but not previousUrl. If previousUrl is empty/invalid (e.g., legacy bad data), loadEndpoint()/reference checks will query with an empty string and the function can silently fall back to creating/reviving the nextUrl endpoint without ever cleaning up the old endpoint row. This breaks the “prevent stale/orphaned endpoints” goal for providers that already contain invalid url values.

Consider validating previousUrl similarly (or explicitly treating invalid previous URLs as a distinct case and ensuring old endpoints can still be cleaned up).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 827:829

Comment:
**Previous URL not validated**

`syncProviderEndpointOnProviderEdit()` validates `nextUrl` but not `previousUrl`. If `previousUrl` is empty/invalid (e.g., legacy bad data), `loadEndpoint()`/reference checks will query with an empty string and the function can silently fall back to creating/reviving the `nextUrl` endpoint without ever cleaning up the old endpoint row. This breaks the “prevent stale/orphaned endpoints” goal for providers that already contain invalid `url` values.

Consider validating `previousUrl` similarly (or explicitly treating invalid previous URLs as a distinct case and ensuring old endpoints can still be cleaned up).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1028 to +1036
const updatePreviousEndpointInPlace = async (executor: QueryExecutor): Promise<void> => {
await executor
.update(providerEndpoints)
.set({
vendorId: input.vendorId,
providerType: input.providerType,
url: nextUrl,
deletedAt: null,
isEnabled: true,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rowcount check

In updatePreviousEndpointInPlace(), the .update(...).where(id=...) result isn’t checked. If the row was concurrently hard-deleted or otherwise not updated, movedInPlace still becomes true and the function returns updated-previous-in-place, but the endpoint row may not actually have moved/been enabled. This can leave the provider pointing at nextUrl while the endpoint table wasn’t updated as assumed.

Consider using .returning({ id: ... }) (or checking affected rows) before setting movedInPlace = true.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 1028:1036

Comment:
**No rowcount check**

In `updatePreviousEndpointInPlace()`, the `.update(...).where(id=...)` result isn’t checked. If the row was concurrently hard-deleted or otherwise not updated, `movedInPlace` still becomes `true` and the function returns `updated-previous-in-place`, but the endpoint row may not actually have moved/been enabled. This can leave the provider pointing at `nextUrl` while the endpoint table wasn’t updated as assumed.

Consider using `.returning({ id: ... })` (or checking affected rows) before setting `movedInPlace = true`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +491 to +499
const updateResult = await db.transaction(async (tx) => {
let previousVendorId: number | null = null;
let previousUrl: string | null = null;
let previousProviderType: Provider["providerType"] | null = null;
let endpointCircuitResetId: number | null = null;

if (shouldSyncEndpoint) {
const [current] = await tx
.select({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale pre-read in tx

updateProvider() reads the “current” provider row and later updates it, but the initial SELECT doesn’t lock the provider row (FOR UPDATE). If two updates race, both transactions can read the same previousUrl/previousVendorId and then each run endpoint sync based on stale previous state, causing extra soft-deletes or missed cleanup.

If correctness under concurrent edits is a goal here, the initial read should lock the provider row (or the update should use a single UPDATE ... RETURNING that also returns the previous values where possible) so only one edit computes the migration from a given previous state.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 491:499

Comment:
**Stale pre-read in tx**

`updateProvider()` reads the “current” provider row and later updates it, but the initial SELECT doesn’t lock the provider row (`FOR UPDATE`). If two updates race, both transactions can read the same `previousUrl/previousVendorId` and then each run endpoint sync based on stale previous state, causing extra soft-deletes or missed cleanup.

If correctness under concurrent edits is a goal here, the initial read should lock the provider row (or the update should use a single `UPDATE ... RETURNING` that also returns the previous values where possible) so only one edit computes the migration from a given previous state.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Feb 7, 2026

Additional Comments (1)

src/repository/provider-endpoints.ts
Silent date fallback

toDate() returns new Date() when it can’t parse the DB value. That will silently turn invalid timestamps into “now”, which can corrupt semantics for createdAt/updatedAt/deletedAt in callers (e.g., a malformed deletedAt becomes non-null “now” and the endpoint looks freshly deleted). Since this transformer is used broadly for provider endpoints/vendors, it can produce incorrect state rather than failing fast.

It’s safer to throw (or at least return a deterministic sentinel like new Date(0)) when the DB value is not parseable.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 56:60

Comment:
**Silent date fallback**

`toDate()` returns `new Date()` when it can’t parse the DB value. That will silently turn invalid timestamps into “now”, which can corrupt semantics for `createdAt/updatedAt/deletedAt` in callers (e.g., a malformed `deletedAt` becomes non-null “now” and the endpoint looks freshly deleted). Since this transformer is used broadly for provider endpoints/vendors, it can produce incorrect state rather than failing fast.

It’s safer to throw (or at least return a deterministic sentinel like `new Date(0)`) when the DB value is not parseable.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/repository/provider-endpoints.ts`:
- Around line 1048-1068: The else branch calling
updatePreviousEndpointInPlace(tx) when tx lacks .transaction() can cause the
whole transaction to become aborted on a unique constraint violation; wrap that
call in a try/catch, catch the error and if isUniqueViolationError(error) throw
a new/relevant error (or rethrow with added context) so callers immediately see
the unique-violation abort risk instead of silently leaving the transaction in
an aborted state; reference executorWithSavepoint, tx,
updatePreviousEndpointInPlace, transaction, and isUniqueViolationError to locate
and update the logic.
- Around line 900-997: ensureNextEndpointActive currently revives soft-deleted
endpoints but does not clear stale probe metadata or reset the circuit breaker;
update the revive branches in ensureNextEndpointActive (the tx.update calls that
set deletedAt:null / isEnabled:true) to also clear all lastProbe* fields (e.g.,
lastProbeOk, lastProbeMs/latency, lastProbeAt) and any circuit-breaker state
fields, and ensure you schedule the same post-transaction circuit-breaker reset
used elsewhere (mirror the post-commit reset logic used in the in-place move
code path) so the revived endpoint starts with fresh probe state; modify the
tx.update calls that reference providerEndpoints and invoke the same
reset-after-commit helper used elsewhere.
🧹 Nitpick comments (4)
src/repository/provider-endpoints.ts (2)

27-54: isUniqueViolationError 健壮性良好,建议考虑收窄字符串匹配。

当前实现同时检查了 codemessagecause 的嵌套结构,覆盖了 Drizzle ORM 包装的 PostgreSQL 错误的多种形态,逻辑合理。

一个小建议:message.includes("duplicate key value") 是基于 PostgreSQL 默认英文错误消息的硬编码匹配。如果数据库 locale 配置为非英文,该匹配会失效。不过在实际使用中,code === "23505" 的检查会先命中,所以 message 匹配只是兜底,风险较低。


827-850: syncProviderEndpointOnProviderEdit 函数体过长(~340行),包含多个嵌套闭包。

整个 runInTx 闭包内定义了 loadEndpointhasActiveReferencesOnPreviousUrlensureNextEndpointActiveupdatePreviousEndpointInPlace 四个内部函数,加上多条件分支逻辑,认知复杂度较高。

考虑将 ensureNextEndpointActiveupdatePreviousEndpointInPlace 提取为文件级别的私有函数(接受 tx 和必要参数),可以降低嵌套层级,也便于单独测试。

tests/unit/repository/provider-endpoint-sync-helper.test.ts (2)

5-55: Mock 设计基于有序队列,与生产代码的查询顺序强耦合。

createTxMock 通过 queue.shift() 按顺序消费 select 结果,任何对 syncProviderEndpointOnProviderEdit 内部查询顺序的调整(如增加一次 select 或调换顺序)都会静默导致测试结果错误而非报错。

建议考虑两种改进方向:

  1. selectLimitMock 中增加队列耗尽时的显式报错(如 throw new Error("unexpected select call")),至少能在查询顺序变化时快速定位。
  2. 长远来看,可按 where 条件做基于参数的匹配分发,而非固定顺序。
最小改进:队列耗尽时报错
-  const selectLimitMock = vi.fn(async () => queue.shift() ?? []);
+  const selectLimitMock = vi.fn(async () => {
+    if (queue.length === 0) {
+      throw new Error("selectLimitMock: unexpected select call - queue exhausted");
+    }
+    return queue.shift()!;
+  });

1-292: 建议补充对 revive 场景(kept-previous-and-revived-next)的测试用例。

当前测试覆盖了 noopcreated-next(隐含在 in-place move 中)、kept-nextsoft-deleted-previous-and-kept-next 等 action,但缺少对 revived-next 相关 action(如 kept-previous-and-revived-nextsoft-deleted-previous-and-revived-next)的直接测试。这些路径涉及对软删除端点的恢复,是核心业务逻辑之一。

Comment on lines +900 to +997
const ensureNextEndpointActive = async (options?: {
reactivateDisabled?: boolean;
}): Promise<"created-next" | "revived-next" | "noop"> => {
const reactivateDisabled = options?.reactivateDisabled ?? true;
const nextEndpoint = await loadEndpoint({
vendorId: input.vendorId,
providerType: input.providerType,
url: nextUrl,
});

if (!nextEndpoint) {
const inserted = await tx
.insert(providerEndpoints)
.values({
vendorId: input.vendorId,
providerType: input.providerType,
url: nextUrl,
label: null,
updatedAt: now,
})
.onConflictDoNothing({
target: [
providerEndpoints.vendorId,
providerEndpoints.providerType,
providerEndpoints.url,
],
})
.returning({ id: providerEndpoints.id });

if (inserted[0]) {
return "created-next";
}

const concurrentEndpoint = await loadEndpoint({
vendorId: input.vendorId,
providerType: input.providerType,
url: nextUrl,
});

if (!concurrentEndpoint) {
throw new Error("[ProviderEndpointSync] failed to load next endpoint after conflict");
}

if (concurrentEndpoint.deletedAt !== null) {
await tx
.update(providerEndpoints)
.set({
deletedAt: null,
isEnabled: true,
updatedAt: now,
})
.where(eq(providerEndpoints.id, concurrentEndpoint.id));

return "revived-next";
}

if (reactivateDisabled && !concurrentEndpoint.isEnabled) {
await tx
.update(providerEndpoints)
.set({
isEnabled: true,
updatedAt: now,
})
.where(eq(providerEndpoints.id, concurrentEndpoint.id));

return "revived-next";
}

return "noop";
}

if (nextEndpoint.deletedAt !== null) {
await tx
.update(providerEndpoints)
.set({
deletedAt: null,
isEnabled: true,
updatedAt: now,
})
.where(eq(providerEndpoints.id, nextEndpoint.id));

return "revived-next";
}

if (reactivateDisabled && !nextEndpoint.isEnabled) {
await tx
.update(providerEndpoints)
.set({
isEnabled: true,
updatedAt: now,
})
.where(eq(providerEndpoints.id, nextEndpoint.id));

return "revived-next";
}

return "noop";
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ensureNextEndpointActive 在 revive 软删除端点时未重置 probe 快照字段和断路器状态。

对比 in-place move 的逻辑(Lines 1031-1043),会清除所有 lastProbe* 字段并在事务提交后重置断路器。但 ensureNextEndpointActive 在 revive 场景(Lines 943-954, 971-982)只设置了 deletedAt: nullisEnabled: true,未清除可能已过时的 probe 数据,也没有触发断路器重置。

这意味着一个被软删除的端点被恢复后,可能携带旧的 probe 状态(如 lastProbeOk: false)和可能处于 open 状态的断路器,导致路由层认为该端点不健康而跳过它。

如果这是有意为之(保留历史数据供参考),建议添加注释说明。否则建议在 revive 时同样重置相关状态。

🤖 Prompt for AI Agents
In `@src/repository/provider-endpoints.ts` around lines 900 - 997,
ensureNextEndpointActive currently revives soft-deleted endpoints but does not
clear stale probe metadata or reset the circuit breaker; update the revive
branches in ensureNextEndpointActive (the tx.update calls that set
deletedAt:null / isEnabled:true) to also clear all lastProbe* fields (e.g.,
lastProbeOk, lastProbeMs/latency, lastProbeAt) and any circuit-breaker state
fields, and ensure you schedule the same post-transaction circuit-breaker reset
used elsewhere (mirror the post-commit reset logic used in the in-place move
code path) so the revived endpoint starts with fresh probe state; modify the
tx.update calls that reference providerEndpoints and invoke the same
reset-after-commit helper used elsewhere.

Comment on lines +1048 to +1068
let movedInPlace = false;
const executorWithSavepoint = tx as QueryExecutor & {
transaction?: <T>(runInTx: (nestedTx: TransactionExecutor) => Promise<T>) => Promise<T>;
};

if (typeof executorWithSavepoint.transaction === "function") {
try {
await executorWithSavepoint.transaction(async (nestedTx) => {
await updatePreviousEndpointInPlace(nestedTx);
});
movedInPlace = true;
} catch (error) {
if (!isUniqueViolationError(error)) {
throw error;
}
}
} else {
// No savepoint support means we cannot safely continue after unique violations.
await updatePreviousEndpointInPlace(tx);
movedInPlace = true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Savepoint 回退路径中,无 savepoint 支持时直接执行更新会在唯一约束冲突时导致事务不可用。

Line 1064-1067 的 else 分支在 tx 不支持 .transaction() 方法时,直接执行 updatePreviousEndpointInPlace(tx)。如果此时发生唯一约束冲突,PostgreSQL 会将当前事务标记为 aborted(current transaction is aborted, commands ignored until end of transaction block),后续所有 SQL 语句都会失败,整个事务最终会回滚。

注释说明了这一点("No savepoint support means we cannot safely continue after unique violations"),但调用方可能不清楚这种隐含的事务中止风险。实际运行中 Drizzle ORM 的事务对象通常支持 .transaction()(即 savepoint),所以这个分支可能极少触发。

🤖 Prompt for AI Agents
In `@src/repository/provider-endpoints.ts` around lines 1048 - 1068, The else
branch calling updatePreviousEndpointInPlace(tx) when tx lacks .transaction()
can cause the whole transaction to become aborted on a unique constraint
violation; wrap that call in a try/catch, catch the error and if
isUniqueViolationError(error) throw a new/relevant error (or rethrow with added
context) so callers immediately see the unique-violation abort risk instead of
silently leaving the transaction in an aborted state; reference
executorWithSavepoint, tx, updatePreviousEndpointInPlace, transaction, and
isUniqueViolationError to locate and update the logic.

@github-actions github-actions bot mentioned this pull request Feb 7, 2026
10 tasks
ding113 added a commit that referenced this pull request Feb 7, 2026
* fix(proxy): add 'cannot be modified' error detection to thinking signature rectifier

Extend the thinking signature rectifier to detect and handle the
Anthropic API error when thinking/redacted_thinking blocks have been
modified from their original response. This error occurs when clients
inadvertently modify these blocks in multi-turn conversations.

The rectifier will now remove these blocks and retry the request,
similar to how it handles other thinking-related signature errors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore(deps): bump jspdf in the npm_and_yarn group across 1 directory

Bumps the npm_and_yarn group with 1 update in the / directory: [jspdf](https://github.com/parallax/jsPDF).


Updates `jspdf` from 3.0.4 to 4.1.0
- [Release notes](https://github.com/parallax/jsPDF/releases)
- [Changelog](https://github.com/parallax/jsPDF/blob/master/RELEASE.md)
- [Commits](parallax/jsPDF@v3.0.4...v4.1.0)

---
updated-dependencies:
- dependency-name: jspdf
  dependency-version: 4.1.0
  dependency-type: direct:production
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: Hot-reload cache invalidation for Request Filters and Sensitive Words (#710)

* fix: hot-reload request filters via globalThis singleton pattern

EventEmitter and RequestFilterEngine now use globalThis caching to ensure
the same instance is shared across different Next.js worker contexts.
This fixes the issue where filter changes required Docker restart.

Added diagnostic logging for event subscription and propagation.

* fix(redis): wait for subscriber connection ready before subscribe

- ensureSubscriber now returns Promise<Redis>, waits for 'ready' event
- subscribeCacheInvalidation returns null on failure instead of noop
- RequestFilterEngine checks cleanup !== null before logging success
- Fixes false "Subscribed" log when Redis connection fails

* feat(sensitive-words): add hot-reload via Redis pub/sub

Enable real-time cache invalidation for sensitive words detector,
matching the pattern used by request-filter-engine and error-rule-detector.

* fix(redis): harden cache invalidation subscriptions

Ensure sensitive-words CRUD emits update events so hot-reload propagates across workers. Roll back failed pub/sub subscriptions, add retry/timeout coverage, and avoid sticky provider-cache subscription state.

* fix(codex): bump default User-Agent fallback

Update the hardcoded Codex UA used when requests lack an effective user-agent (e.g. filtered out). Keep unit tests in sync with the new default.

* fix(redis): resubscribe cache invalidation after reconnect

Clear cached subscription state on disconnect and resubscribe on ready so cross-worker cache invalidation survives transient Redis reconnects. Add unit coverage, avoid misleading publish logs, track detector cleanup handlers, and translate leftover Russian comments to English.

* fix(sensitive-words): use globalThis singleton detector

Align SensitiveWordDetector with existing __CCH_* singleton pattern to avoid duplicate instances across module reloads. Extend singleton unit tests to cover the detector.

* chore: format code (req-fix-dda97fd)

* fix: address PR review comments

- pubsub.ts: use `once` instead of `on` for ready event to prevent
  duplicate resubscription handlers on reconnect
- forwarder.ts: extract DEFAULT_CODEX_USER_AGENT constant
- provider-cache.ts: wrap subscribeCacheInvalidation in try/catch
- tests: use exported constant instead of hardcoded UA string

* fix(redis): resubscribe across repeated reconnects

Ensure pub/sub resubscribe runs on every reconnect, extend unit coverage, and keep emitRequestFiltersUpdated resilient when logger import fails.

---------

Co-authored-by: John Doe <johndoe@example.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* feat(logs): make cost column toggleable with improved type safety (#715)

close #713

* fix(proxy): add OpenAI chat completion format support in usage extraction (#705) (#716)

The `extractUsageMetrics` function was missing support for OpenAI chat
completion format fields (`prompt_tokens`/`completion_tokens`), causing
token statistics to not be recorded for OpenAI-compatible providers.

Changes:
- Add `prompt_tokens` -> `input_tokens` mapping
- Add `completion_tokens` -> `output_tokens` mapping
- Preserve priority: Claude > Gemini > OpenAI format
- Add 5 unit tests for OpenAI format handling

Closes #705

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(currency): respect system currencyDisplay setting in UI (#717)

Fixes #678 - Currency display unit configuration was not applied.

Root cause:
- `users-page-client.tsx` hardcoded `currencyCode="USD"`
- `UserLimitBadge` and `LimitStatusIndicator` had hardcoded `unit="$"` default
- `big-screen/page.tsx` used hardcoded "$" in multiple places

Changes:
- Add `getCurrencySymbol()` helper function to currency.ts
- Fetch system settings in `users-page-client.tsx` and pass to table
- Pass `currencySymbol` from `user-key-table-row.tsx` to limit badges
- Remove hardcoded "$" defaults from badge components
- Update big-screen page to fetch settings and use dynamic symbol
- Add unit tests for `getCurrencySymbol`

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat(gemini): add Google Search web access preference for Gemini providers (#721)

* feat(gemini): add Google Search web access preference for Gemini providers

Add provider-level preference for Gemini API type providers to control
Google Search (web access) tool injection:

- inherit: Follow client request (default)
- enabled: Force inject googleSearch tool into request
- disabled: Force remove googleSearch tool from request

Changes:
- Add geminiGoogleSearchPreference field to provider schema
- Add GeminiGoogleSearchPreference type and validation
- Implement applyGeminiGoogleSearchOverride with audit trail
- Add UI controls in provider form (Gemini Overrides section)
- Add i18n translations for 5 languages (en, zh-CN, zh-TW, ja, ru)
- Integrate override logic in proxy forwarder for Gemini requests
- Add 22 unit tests for the override logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(gemini): address code review feedback

- Use explicit else-if for disabled preference check (gemini-code-assist)
- Use i18n for SelectValue placeholder instead of hardcoded string (coderabbitai)
- Sync overridden body back to session.request.message for log consistency (coderabbitai)
- Persist Gemini special settings immediately, matching Anthropic pattern (coderabbitai)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(gemini): use strict types for provider config and audit

- Narrow preference type to "enabled" | "disabled" (exclude unreachable "inherit")
- Use ProviderType and GeminiGoogleSearchPreference types instead of string

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(api): 透传 /api/actions 认证会话以避免 getUsers 返回空数据 (#720)

* fix(api): 透传 /api/actions 认证会话以避免 getUsers 返回空数据

* fix(auth): 让 scoped session 继承 allowReadOnlyAccess 语义并支持内部降权校验

* chore: format code (dev-93585fa)

* fix: bound SessionTracker active_sessions zsets by env TTL (#718)

* fix(session): bound active_sessions zsets by env ttl

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(rate-limit): pass session ttl to lua cleanup

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(session): validate SESSION_TTL env and prevent ZSET leak on invalid values

- Add input validation for SESSION_TTL (reject NaN, 0, negative; default 300)
- Guard against invalid TTL in Lua script to prevent clearing all sessions
- Use dynamic EXPIRE based on SESSION_TTL instead of hardcoded 3600s
- Add unit tests for TTL validation and dynamic expiry behavior

---------

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(provider): stop standard-path fallback to legacy provider url

* feat(providers): expose vendor endpoint pools in settings UI (#719)

* feat(providers): add endpoint status mapping

* feat(providers): add endpoint pool hover

* feat(providers): show vendor endpoints in list rows

* feat(providers): extract vendor endpoint CRUD table

* chore(i18n): add provider endpoint UI strings

* fix(providers): integrate endpoint pool into provider form

* fix(provider): wrap endpoint sync in transactions to prevent race conditions (#730)

* fix(provider): wrap provider create/update endpoint sync in transactions

Provider create and update operations now run vendor resolution and
endpoint sync inside database transactions to prevent race conditions
that could leave orphaned or inconsistent endpoint rows.

Key changes:
- createProvider: wrap vendor + insert + endpoint seed in a single tx
- updateProvider: wrap vendor + update + endpoint sync in a single tx
- Add syncProviderEndpointOnProviderEdit for atomic URL/type/vendor
  migration with in-place update, soft-delete, and conflict handling
- Vendor cleanup failures degrade to warnings instead of propagating
- Add comprehensive unit and integration tests for sync edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(provider): defer endpoint circuit reset until transaction commit

Avoid running endpoint circuit reset side effects inside DB transactions to prevent rollback inconsistency. Run resets only after commit and add regression tests for deferred reset behavior in helper and provider update flows.

* fix(provider): distinguish noop from created-next in endpoint sync action label

When ensureNextEndpointActive() returns "noop" (concurrent transaction
already created an active next endpoint), the action was incorrectly
labelled "kept-previous-and-created-next". Add a new
"kept-previous-and-kept-next" action to ProviderEndpointSyncAction and
use a three-way branch so callers and logs reflect the true outcome.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments from PR #731

- fix(auth): prevent scoped session access widening via ?? -> && guard
- fix(i18n): standardize zh-CN provider terminology to "服务商"
- fix(i18n): use consistent Russian translations for circuit status
- fix(i18n): replace raw formatDistanceToNow with locale-aware RelativeTime
- fix(gemini): log warning for unknown google search preference values
- fix(error-rules): check subscribeCacheInvalidation return value
- fix(test): correct endpoint hover sort test to assert URLs not labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: export auth session storage and fix test mock types

- Export authSessionStorage from auth-session-storage.node.ts to prevent
  undefined on named imports; remove duplicate declare global block
- Fix mockEndpoints in provider-endpoint-hover test: remove nonexistent
  lastOk/lastLatencyMs fields, add missing lastProbe* fields, use Date
  objects for createdAt/updatedAt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: miraserver <20286838+miraserver@users.noreply.github.com>
Co-authored-by: John Doe <johndoe@example.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: 泠音 <im@ling.plus>
Co-authored-by: Longlone <41830147+WAY29@users.noreply.github.com>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider bug Something isn't working size/XL Extra Large PR (> 1000 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant