Skip to content

fix(proxy): prevent stuck requesting on upstream non-ok body hang#751

Merged
ding113 merged 6 commits intoding113:devfrom
tesgth032:fix/hang-stuck-requesting
Feb 10, 2026
Merged

fix(proxy): prevent stuck requesting on upstream non-ok body hang#751
ding113 merged 6 commits intoding113:devfrom
tesgth032:fix/hang-stuck-requesting

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Feb 10, 2026

概要

修复一个会导致代理请求长期卡在 requesting 的问题:当上游返回 4xx/5xx,但响应 body 永远不结束时,服务端在读取错误体阶段会无限等待,从而让整条请求链路(含客户端)一直悬挂(重启进程后暂时恢复)。

同时修复探针日志清理任务在部分运行时/驱动组合下的 timestamptz 解析失败问题,避免清理失败导致日志持续堆积。

问题与根因

ProxyForwarder.doForward() 中:

  • 当上游返回非 2xx 时,错误处理会读取 response.text() 来构造 ProxyError
  • 若上游在返回 403/5xx 后保持连接不关闭、body 也不结束,则 response.text() 会无限等待;
  • 若在读取错误体之前提前清除 response timeout,则 abort 信号不会触发,导致该请求永远不结束,前端表现为一直“请求中”,并可能逐步耗尽连接/资源。

另外,在代理失败降级到直连(proxyFallbackToDirect)或 HTTP/2 -> HTTP/1.1 回退时,catch 分支会清除 response timeout。若随后直连/回退请求拿到非 2xx 且 body 仍不结束,会再次进入同样的“无 timeout 读取错误体”路径,从而复现 hang。

探针日志清理方面,直接把 Date 作为 SQL 参数传入时,部分运行时/驱动会把它序列化成类似 "Mon Feb ... GMT+0800 (China Standard Time)" 的非 ISO 字符串,Postgres 无法解析为 timestamptz,导致清理任务报错并无法继续清理。

解决方案

  1. 非 2xx 错误体读取路径保持超时监控
  • 在读取错误体前不清除 response timeout;
  • try/finally 包裹 ProxyError.fromUpstreamResponse(),在 finally 中统一清理超时定时器;
  • 当上游错误响应 body 卡住时,仍会被现有的 response timeout abort,中断 response.text(),避免整条链路永久悬挂。
  1. 代理降级/协议回退后恢复 response timeout
  • 在 fallback 请求成功后重新启动 response timeout,确保后续的非 2xx 错误体读取仍受超时保护,避免“proxy 失败 -> 直连拿到 403 -> 读取错误体挂起”。
  1. 探针日志清理 timestamptz 兼容性修复
  • beforeDate 统一转为 toISOString()
  • 在 SQL 中显式 ::timestamptz cast,避免 Postgres 解析歧义/失败。

变更点

  • src/app/v1/_lib/proxy/forwarder.ts
    • 非 2xx 响应处理:确保 response timeout 覆盖错误体读取(避免 response.text() 悬挂)
    • 代理失败降级到直连:恢复 response timeout(避免 fallback 后再次 hang)
    • (小)修复一处注释文本乱码(仅注释,无行为变化)
  • src/repository/provider-endpoints.ts
    • 探针日志清理:beforeDate 改用 ISO-8601 字符串 + ::timestamptz
  • tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
    • 新增回归测试:模拟上游返回 403 且永不结束 body,验证在超时窗口内抛出 ProxyError,不会挂死
    • 覆盖 proxyFallbackToDirect 降级场景

测试

  • npm run test
  • npm run typecheck
  • npm run build

Greptile Overview

Greptile Summary

该 PR 修复代理转发在上游返回非 2xx 且响应 body 永不结束时,服务端在读取错误体阶段可能永久等待的问题:通过保证 response timeout 覆盖 ProxyError.fromUpstreamResponse() 的错误体读取,并在 fallback(proxy->direct / h2->h1)后恢复 timeout,避免请求链路长期卡在 requesting。

另外,探针日志清理任务改为用 ISO-8601 字符串传参并显式 timestamptz cast,解决部分运行时/驱动把 Date 序列化为非 ISO 字符串导致 Postgres 解析失败、清理任务中断的问题。

Confidence Score: 4/5

  • This PR is reasonably safe to merge once the remaining timeout-timer overlap risk is confirmed/removed.
  • Core fix (keeping response timeout active during non-OK body reads and restoring it after fallback) addresses the described hang, and the SQL cleanup change is now robust via CAST. The only concrete merge-blocker risk is potential overlapping/stale response-timeout timers when restarting after fallback, which could abort later attempts unexpectedly.
  • src/app/v1/_lib/proxy/forwarder.ts

Important Files Changed

Filename Overview
src/app/v1/_lib/proxy/forwarder.ts Keeps response timeout active while reading non-OK bodies and restarts timeout after direct fallback; fixes one garbled comment. Main remaining concern is ensuring timeout timers can’t overlap across attempts when restarting after fallback.
src/repository/provider-endpoints.ts Changes probe-log cleanup to bind an ISO timestamp string and explicitly cast to timestamptz via CAST(... AS timestamptz), improving compatibility with drivers that stringify Date non-ISO.
tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts Adds regression tests for hanging non-OK response bodies and for proxyFallbackToDirect; includes socket tracking and forced teardown to reduce suite hangs.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Forwarder as ProxyForwarder.doForward()
  participant Upstream
  participant Timeout as responseTimeout

  Client->>Forwarder: request (with responseController)
  Forwarder->>Timeout: start response timeout
  Forwarder->>Upstream: fetch/forward request
  Upstream-->>Forwarder: response (status 4xx/5xx)
  alt response.ok
    Forwarder-->>Client: stream response
  else !response.ok
    Forwarder->>Forwarder: ProxyError.fromUpstreamResponse()
    Note over Forwarder,Upstream: error body may never end
    Timeout-->>Forwarder: timer fires
    Forwarder->>Forwarder: responseController.abort()
    Forwarder-->>Client: throw ProxyError (timeout-protected)
    Forwarder->>Timeout: clear timeout
  end

  opt proxyFallbackToDirect / h2->h1 fallback
    Forwarder->>Timeout: restart response timeout
    Forwarder->>Upstream: retry request
  end
Loading

@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

调整了 ProxyForwarder 的超时清理与回退路径控制流(包括直接回退与 HTTP/1.1 回退),修正非 OK 上游响应的超时清理顺序;对数据库时间参数序列化做兼容处理;新增单元测试覆盖上游返回非 OK 且响应体挂起的场景。

Changes

Cohort / File(s) Summary
Proxy 转发与超时/回退逻辑
src/app/v1/_lib/proxy/forwarder.ts
在代理直接回退和 HTTP/1.1 回退成功后重启响应超时定时器;将对非 OK 上游响应的抛错移入 try/finally,确保 finally 中清理超时定时器;更新 gzip 路径的清理注释与相关控制流,避免定时器泄露。
数据库参数序列化兼容
src/repository/provider-endpoints.ts
deleteProviderEndpointProbeLogsBeforeDateBatch 中将 input.beforeDate 序列化为 ISO 字符串并以 timestamptz 比较,确保不同运行时/驱动下的时间参数正确解析。
单元测试(新增)
tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
新增测试,模拟返回 403 且响应体不终止的上游,以验证 ProxyForwarder 在挂起读取情况下能抛出对应的 ProxyError 且避免测试/运行挂起。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题清楚地总结了主要变更:修复代理请求在上游返回非OK响应且body永不结束时导致的卡死问题,与PR的核心目标完全对应。
Description check ✅ Passed 拉取请求的描述与变更集高度相关,详细解释了代理请求卡顿问题的根因、解决方案和具体变更点。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求解决了代理请求稳定性和数据库清理方面的关键问题。它通过调整超时处理逻辑,解决了当上游服务返回错误状态(4xx/5xx)但未能终止响应体时,代理请求可能无限期卡住的问题。此外,它修复了探针日志清理过程中日期格式不兼容的问题,确保了日志删除的可靠性。还添加了一个新的测试用例来验证代理修复。

Highlights

  • 代理请求卡住修复: 修改了 ProxyForwarder,以防止上游服务返回非 2xx HTTP 状态码(例如 4xx/5xx)但未正确关闭响应体时,导致请求卡住。响应超时定时器现在在尝试读取响应体后,于 finally 块中清除,确保 response.text() 调用不会无限等待。
  • 探针日志清理日期格式修复: 确保用于删除 provider_endpoint_probe_logs 的 beforeDate 参数始终转换为 ISO-8601 格式,并显式转换为 PostgreSQL 的 timestamptz 类型,解决了某些运行时/驱动组合下的解析问题。
  • 新增测试用例: 增加了一个新的单元测试,专门模拟上游服务器返回 403 状态码且响应体挂起的情况,以验证修复方案的有效性。
  • 导入顺序调整: 调整了 src/lib/security/api-key-vacuum-filter.ts 中的导入语句顺序以保持一致性。
Changelog
  • src/app/v1/_lib/proxy/forwarder.ts
    • 更改了非 2xx HTTP 响应的响应超时清除逻辑。
    • 将 ProxyError.fromUpstreamResponse 调用包装在 try...finally 块中,以确保 clearTimeout 始终被调用。
    • 添加了注释解释此更改背后的原理,以防止请求挂起。
  • src/lib/security/api-key-vacuum-filter.ts
    • 重新排序了导入语句。
  • src/repository/provider-endpoints.ts
    • 修改了 deleteProviderEndpointProbeLogsBeforeDateBatch 函数,将 input.beforeDate 转换为 ISO-8601 字符串,并在 SQL 查询中显式转换为 timestamptz 类型。
    • 添加了注释解释 Postgres 日期解析的兼容性修复。
  • tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
    • 添加了一个新的测试文件,用于模拟上游服务器返回 403 状态码且响应体挂起的情况。
    • 验证了 ProxyForwarder 在这种情况下正确处理超时并拒绝 Promise,抛出 ProxyError。
Activity
  • 作者 tesgth032 创建了此拉取请求,旨在修复代理问题和数据库日期解析问题。
  • 此拉取请求包含对 src/app/v1/_lib/proxy/forwarder.ts、src/lib/security/api-key-vacuum-filter.ts 和 src/repository/provider-endpoints.ts 的代码更改。
  • 新增了测试文件 tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts 以覆盖代理修复。
  • 作者列出了 npm run test、npm run typecheck 和 npm run build 的检查项。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working area:core area:provider size/M Medium PR (< 500 lines) labels Feb 10, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个拉取请求修复了一个关键问题,即代理在处理上游返回非 OK 状态且响应体挂起时可能卡住。ProxyForwarder 中的核心逻辑变更——将清除响应超时的操作延迟到响应体读取之后——是正确的,并有效地解决了问题。此外,provider-endpoints.ts 中为数据库查询标准化日期格式的更改,是一个很好的健壮性和兼容性改进。然而,在代理的 URL construction logic 中发现了一个显著的 SSRF vulnerability,如果客户端提供恶意路径,这可能导致 upstream provider API keys 泄露。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 133 to 146
async function startServer(): Promise<{ server: Server; baseUrl: string }> {
const server = createServer((req, res) => {
// 模拟上游异常:返回 403,但永远不结束 body(导致 response.text() 无限等待)
res.writeHead(403, { "content-type": "application/json" });
res.write(JSON.stringify({ error: { message: "forbidden" } }));

// 当客户端中断时,主动销毁连接,避免测试进程残留挂起连接
req.on("aborted", () => {
try {
res.destroy();
} catch {
// ignore
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky test cleanup

This test’s server tries to clean up the never-ending response only on req.on("aborted"), but the client-side abort here is driven by undici’s AbortController/timeout, which won’t reliably emit IncomingMessage’s aborted event in all cases. If that event doesn’t fire, the response/socket can remain open and make the unit test suite hang/flap. Consider adding a deterministic server-side teardown (e.g., destroy the socket on a timer or on res.close/req.close) so the test process can always exit cleanly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
Line: 133:146

Comment:
**Flaky test cleanup**

This test’s server tries to clean up the never-ending response only on `req.on("aborted")`, but the client-side abort here is driven by undici’s `AbortController`/timeout, which won’t reliably emit `IncomingMessage`’s `aborted` event in all cases. If that event doesn’t fire, the response/socket can remain open and make the unit test suite hang/flap. Consider adding a deterministic server-side teardown (e.g., destroy the socket on a timer or on `res.close`/`req.close`) so the test process can always exit cleanly.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/repository/provider-endpoints.ts`:
- Around line 274-282: The comment above the beforeDateIso conversion contains
an emoji (⚠️) which violates the no-emoji rule; update the comment near the
input.beforeDate.toISOString() and the SQL block (referencing beforeDateIso,
db.execute, sql`...`, provider_endpoint_probe_logs and created_at) to remove the
emoji and rephrase the explanatory text in plain ASCII (e.g., "Note:" or
"Warning:") while keeping the existing explanation about runtime drivers
serializing Date and the justification for using ISO-8601 + ::timestamptz; do
not change the implementation (beforeDateIso or the SQL) — only edit the comment
text to eliminate emoji characters.

In `@tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts`:
- Around line 33-87: The factory createProvider returns a value typed as
Provider but omits three required fields from the Provider interface; add
anthropicMaxTokensPreference, anthropicThinkingBudgetPreference, and
geminiGoogleSearchPreference to the returned default object (set each to null)
so the object conforms to Provider and TypeScript compiles; update the
createProvider(...) return object to include these three keys with null
defaults.
🧹 Nitpick comments (1)
tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts (1)

178-182: 访问私有方法 doForward 的方式较脆弱,建议添加注释说明原因。

通过双重类型断言 ProxyForwarder as unknown as { doForward: ... } 访问私有静态方法,如果方法签名变更会静默失败。考虑添加一行注释说明为何需要直接调用私有方法(例如:"直接测试 doForward 以隔离单次转发行为,避免 send() 的重试/供应商切换逻辑干扰")。

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No significant issues identified in this PR. The core fix correctly addresses a real production hang scenario where upstream servers return 4xx/5xx status codes but never close the response body, causing response.text() to block indefinitely. The solution of deferring clearTimeout to a finally block after body reading is sound and well-tested.

PR Size: M

  • Lines changed: 239 (230 additions, 9 deletions)
  • Files changed: 4

Review Coverage

  • Logic and correctness - Clean. The try { throw await ... } finally { clearTimeout } pattern correctly keeps the response timeout active during body reading, then cleans up regardless of outcome.
  • Security (OWASP Top 10) - Clean. SQL change uses parameterized queries via Drizzle sql template literal; no injection risk.
  • Error handling - Clean. ProxyError.fromUpstreamResponse already handles response.text() rejection internally (errors.ts:50-54), so timeout-induced abort errors are gracefully converted to a ProxyError with the correct status code.
  • Type safety - Clean. No any usage in new code; type assertions in test file are necessary for accessing private static method.
  • Documentation accuracy - Clean. Comments accurately describe the problem and the fix rationale.
  • Test coverage - Adequate. New test (proxy-forwarder-nonok-body-hang.test.ts) directly reproduces the hang scenario with a mock server that sends headers but never closes the body, and verifies the request terminates with a ProxyError(403) within the timeout window. A 2-second safety timeout prevents test suite hangs on regression.
  • Code clarity - Good. The ancillary fix in provider-endpoints.ts (Date -> ISO-8601 string with explicit ::timestamptz cast) is a defensive improvement for cross-runtime compatibility.

Automated review by Claude AI

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +2453 to 2462
try {
throw await ProxyError.fromUpstreamResponse(response, {
id: provider.id,
name: provider.name,
});
} finally {
if (responseTimeoutId) {
clearTimeout(responseTimeoutId);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout timer not cleared

In the !response.ok branch, clearTimeout(responseTimeoutId) only runs if ProxyError.fromUpstreamResponse() completes. If that call hangs (the exact scenario this PR fixes), the finally never executes, so the timer is left running and may fire later during subsequent retries/requests, aborting via the shared responseController and producing confusing logs/behavior. This should clear the timer once the timeout fires (e.g., inside the setTimeout callback) so it can’t linger past the request lifecycle.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2453:2462

Comment:
**Timeout timer not cleared**

In the `!response.ok` branch, `clearTimeout(responseTimeoutId)` only runs if `ProxyError.fromUpstreamResponse()` completes. If that call hangs (the exact scenario this PR fixes), the `finally` never executes, so the timer is left running and may fire later during subsequent retries/requests, aborting via the shared `responseController` and producing confusing logs/behavior. This should clear the timer once the timeout fires (e.g., inside the `setTimeout` callback) so it can’t linger past the request lifecycle.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 279 to 283
const result = await db.execute(sql`
WITH ids_to_delete AS (
SELECT id FROM provider_endpoint_probe_logs
WHERE created_at < ${input.beforeDate}
WHERE created_at < ${beforeDateIso}::timestamptz
ORDER BY created_at ASC
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw SQL cast usage

WHERE created_at < ${beforeDateIso}::timestamptz relies on the template interpolation to produce a single SQL parameter followed by a cast. With some SQL builders this can get split such that the cast is treated as text, yielding invalid SQL or skipping the cast. Prefer applying the cast on the SQL side (e.g., sql${beforeDateIso}::timestamptz`` or using a typed parameter) to ensure the cast is part of the SQL fragment, not the bound value.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 279:283

Comment:
**Raw SQL cast usage**

`WHERE created_at < ${beforeDateIso}::timestamptz` relies on the template interpolation to produce a single SQL parameter followed by a cast. With some SQL builders this can get split such that the cast is treated as text, yielding invalid SQL or skipping the cast. Prefer applying the cast on the SQL side (e.g., `sql`${beforeDateIso}::timestamptz`` or using a typed parameter) to ensure the cast is part of the SQL fragment, not the bound value.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032 tesgth032 force-pushed the fix/hang-stuck-requesting branch from 0f5c66e to 1a600db Compare February 10, 2026 10:39
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 378 to 382
const result = await db.execute(sql`
WITH ids_to_delete AS (
SELECT id FROM provider_endpoint_probe_logs
WHERE created_at < ${input.beforeDate}
WHERE created_at < ${beforeDateIso}::timestamptz
ORDER BY created_at ASC
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid SQL parameter cast

WHERE created_at < ${beforeDateIso}::timestamptz is not valid Drizzle SQL interpolation: the ::timestamptz ends up outside the parameter placeholder, which produces invalid SQL / skips the cast depending on the driver. You need to include the cast in the sql fragment (cast the parameter inside the template), otherwise this cleanup query can fail at runtime.

Also appears in this function only (no other occurrences found).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 378:382

Comment:
**Invalid SQL parameter cast**

`WHERE created_at < ${beforeDateIso}::timestamptz` is not valid Drizzle SQL interpolation: the `::timestamptz` ends up outside the parameter placeholder, which produces invalid SQL / skips the cast depending on the driver. You need to include the cast in the `sql` fragment (cast the parameter inside the template), otherwise this cleanup query can fail at runtime.

Also appears in this function only (no other occurrences found).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +2453 to 2462
try {
throw await ProxyError.fromUpstreamResponse(response, {
id: provider.id,
name: provider.name,
});
} finally {
if (responseTimeoutId) {
clearTimeout(responseTimeoutId);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaked timeout after abort

In the !response.ok path you await ProxyError.fromUpstreamResponse(response, ...) inside a try/finally and only clear responseTimeoutId in finally. If the timeout fires while response.text() is pending, the abort will happen but response.text() may still not resolve promptly, so the finally might not run until much later (or ever if undici never settles), leaving responseTimeoutId live past the request lifecycle. That can cause the timer to fire later and abort via the same responseController, affecting subsequent retries/requests and producing misleading logs. Clearing the timer when it fires (inside the setTimeout callback) avoids this lingering timer.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2453:2462

Comment:
**Leaked timeout after abort**

In the `!response.ok` path you `await ProxyError.fromUpstreamResponse(response, ...)` inside a `try/finally` and only clear `responseTimeoutId` in `finally`. If the timeout fires while `response.text()` is pending, the abort will happen but `response.text()` may still not resolve promptly, so the `finally` might not run until much later (or ever if undici never settles), leaving `responseTimeoutId` live past the request lifecycle. That can cause the timer to fire later and abort via the same `responseController`, affecting subsequent retries/requests and producing misleading logs. Clearing the timer when it fires (inside the `setTimeout` callback) avoids this lingering timer.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 2832 to 2836

// 移�� content-encoding 和 content-length(避免下游再解压或使用错误长度)
// 移除 content-encoding 和 content-length(避免下游再解压或使用错误长度)
responseHeaders.delete("content-encoding");
responseHeaders.delete("content-length");
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Garbled comment encoding

This comment appears to contain mojibake (// 移�� content-encoding ...). It’s likely an encoding/copy artifact introduced in this PR and should be fixed to readable text to avoid confusing future maintainers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2832:2836

Comment:
**Garbled comment encoding**

This comment appears to contain mojibake (`// 移�� content-encoding ...`). It’s likely an encoding/copy artifact introduced in this PR and should be fixed to readable text to avoid confusing future maintainers.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

已根据 CodeRabbit 的变更请求更新:

  • src/repository/provider-endpoints.ts:移除注释中的 emoji(实现不变)
  • tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts:补齐 Provider 必填字段默认值,并补充说明为何直接调用 doForward
  • 另外补齐一个遗漏:proxyFallbackToDirect 降级到直连后恢复 response timeout,避免非 2xx 错误体读取再次悬挂;已加回归用例覆盖

本地验证:npm run test / npm run typecheck / npm run build 均通过。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts`:
- Around line 182-186: The test extracts the private static method by casting
and calling it as a bare function, which loses class context; change both usages
where you grab doForward (the one using ProxyForwarder as unknown ... .doForward
around the session/provider/baseUrl calls and the second occurrence later) to
invoke the method with the class bound as this (e.g., call or bind the function
with ProxyForwarder as the thisArg) so that inside doForward any references to
this (static helpers) remain valid; locate the symbol ProxyForwarder and its
doForward reference and replace the direct invocation with a bound/call
invocation preserving the ProxyForwarder context for both occurrences.
🧹 Nitpick comments (3)
tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts (3)

167-260: 两个测试用例结构高度重复,可考虑提取共享逻辑。

test 1(Line 168-212)与 test 2(Line 214-260)的 Promise.race + timeout 守卫 + ProxyError 断言 + server cleanup 逻辑几乎完全一致,仅 createProvider 的 overrides 不同。可以抽取一个辅助函数(如 runForwardAndExpect403)来减少重复,提高可维护性。

示例重构思路
async function runForwardAndExpect403(providerOverrides: Partial<Provider>) {
  const { server, baseUrl } = await startServer();
  const clientAbortController = new AbortController();

  try {
    const provider = createProvider({ url: baseUrl, ...providerOverrides });
    const session = createSession({ clientAbortSignal: clientAbortController.signal });
    session.setProvider(provider);

    const doForward = (
      ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }
    ).doForward.bind(ProxyForwarder);

    const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>;

    const result = await Promise.race([
      forwardPromise.then(
        () => ({ type: "resolved" as const }),
        (error) => ({ type: "rejected" as const, error }),
      ),
      new Promise<{ type: "timeout" }>((resolve) =>
        setTimeout(() => resolve({ type: "timeout" as const }), 2_000),
      ),
    ]);

    if (result.type === "timeout") {
      clientAbortController.abort(new Error("test_timeout"));
      throw new Error("doForward timed out — possible non-ok body hang regression");
    }

    expect(result.type).toBe("rejected");
    expect(result.type === "rejected" ? result.error : null).toBeInstanceOf(ProxyError);

    const err = (result as { type: "rejected"; error: unknown }).error as ProxyError;
    expect(err.statusCode).toBe(403);
  } finally {
    await new Promise<void>((resolve) => server.close(() => resolve()));
  }
}

92-134: createSession 通过 Object.create + Object.assign 绕过构造函数创建 mock session——可行但脆弱。

这种方式能工作,但当 ProxySession 的构造函数逻辑或内部字段变更时,mock 不会同步更新,可能导致测试静默通过但实际行为不一致。建议在文件顶部添加简短注释说明此模式的局限性,方便后续维护者理解。


136-165: req.on("aborted") 在 Node.js ≥ 16.12 / v17+ 中已被弃用,建议改用 close 事件。

IncomingMessageaborted 事件已标记为 deprecated(DEP0156),在较新的 Node.js 版本中不再保证触发。建议改为监听 close 事件并配合 req.destroyed 检查来清理连接。虽然 finally 中的 server.close() 能作为兜底,但在使用较新 Node 版本的 CI 环境上,不及时销毁连接可能导致资源泄漏。

建议的修复
-    req.on("aborted", () => {
-      try {
-        res.destroy();
-      } catch {
-        // ignore
-      }
-    });
+    req.on("close", () => {
+      if (req.destroyed) {
+        try {
+          res.destroy();
+        } catch {
+          // ignore
+        }
+      }
+    });

Comment on lines 182 to 186
const doForward = (
ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }
).doForward;

const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

通过 as unknown 绕过类型系统访问 private static doForward 会丢失 this 上下文。

const doForward = (ProxyForwarder as unknown as ...).doForward 提取了静态方法的引用,随后以裸函数方式调用 doForward(session, provider, baseUrl)。如果 doForward 内部通过 this 调用其他静态方法(如 this.someHelper()),则在 strict mode 下 thisundefined,会导致运行时异常。

更安全的写法是使用 .call.bind 保留类上下文:

建议的修复
-      const doForward = (
-        ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }
-      ).doForward;
-
-      const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>;
+      const doForward = (
+        ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }
+      ).doForward.bind(ProxyForwarder);
+
+      const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>;

两处调用(Line 182-186 和 Line 230-234)均需同步修改。

Also applies to: 230-234

🤖 Prompt for AI Agents
In `@tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts` around lines 182 -
186, The test extracts the private static method by casting and calling it as a
bare function, which loses class context; change both usages where you grab
doForward (the one using ProxyForwarder as unknown ... .doForward around the
session/provider/baseUrl calls and the second occurrence later) to invoke the
method with the class bound as this (e.g., call or bind the function with
ProxyForwarder as the thisArg) so that inside doForward any references to this
(static helpers) remain valid; locate the symbol ProxyForwarder and its
doForward reference and replace the direct invocation with a bound/call
invocation preserving the ProxyForwarder context for both occurrences.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 378 to 382
const result = await db.execute(sql`
WITH ids_to_delete AS (
SELECT id FROM provider_endpoint_probe_logs
WHERE created_at < ${input.beforeDate}
WHERE created_at < ${beforeDateIso}::timestamptz
ORDER BY created_at ASC
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken timestamptz cast

WHERE created_at < ${beforeDateIso}::timestamptz is not valid Drizzle interpolation: the ::timestamptz ends up outside the parameter placeholder, which can generate invalid SQL at runtime (or skip the cast depending on driver). Wrap the cast inside the sql fragment (e.g. WHERE created_at < (${sql${beforeDateIso}})::timestamptz or bind a Date/typed parameter in a way Drizzle will cast correctly) so the cast is part of the SQL fragment rather than concatenated after a bound value.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 378:382

Comment:
**Broken timestamptz cast**

`WHERE created_at < ${beforeDateIso}::timestamptz` is not valid Drizzle interpolation: the `::timestamptz` ends up outside the parameter placeholder, which can generate invalid SQL at runtime (or skip the cast depending on driver). Wrap the cast inside the `sql` fragment (e.g. `WHERE created_at < (${sql`${beforeDateIso}`})::timestamptz` or bind a `Date`/typed parameter in a way Drizzle will cast correctly) so the cast is part of the SQL fragment rather than concatenated after a bound value.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 2467 to 2483
if (!response.ok) {
// HTTP 错误:清除响应超时定时器
if (responseTimeoutId) {
clearTimeout(responseTimeoutId);
// ⚠️ HTTP 错误:不要在读取响应体之前清除响应超时定时器
// 原因:某些上游会在返回 4xx/5xx 后“卡住不结束 body”,
// 若提前 clearTimeout,会导致 ProxyError.fromUpstreamResponse() 的 response.text() 无限等待,
// 从而让整条请求链路(含客户端)悬挂,前端表现为一直“请求中”。
//
// 正确策略:保留 response timeout 继续监控 body 读取,并在 finally 里清理定时器。
try {
throw await ProxyError.fromUpstreamResponse(response, {
id: provider.id,
name: provider.name,
});
} finally {
if (responseTimeoutId) {
clearTimeout(responseTimeoutId);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout cleanup never runs

In the !response.ok path you rely on the finally to clearTimeout(responseTimeoutId), but the whole point of this change is that ProxyError.fromUpstreamResponse() can hang on response.text(). If that promise never settles (even after responseController.abort()), the finally won’t execute and the timer remains live past the request lifecycle, potentially firing later and aborting via the same responseController. Consider clearing the timeout when it fires (inside the setTimeout callback) so it can’t linger if the awaited body read never resolves.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2467:2483

Comment:
**Timeout cleanup never runs**

In the `!response.ok` path you rely on the `finally` to `clearTimeout(responseTimeoutId)`, but the whole point of this change is that `ProxyError.fromUpstreamResponse()` can hang on `response.text()`. If that promise never settles (even after `responseController.abort()`), the `finally` won’t execute and the timer remains live past the request lifecycle, potentially firing later and aborting via the same `responseController`. Consider clearing the timeout when it fires (inside the `setTimeout` callback) so it can’t linger if the awaited body read never resolves.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 136 to 149
async function startServer(): Promise<{ server: Server; baseUrl: string }> {
const server = createServer((req, res) => {
// 模拟上游异常:返回 403,但永远不结束 body(导致 response.text() 无限等待)
res.writeHead(403, { "content-type": "application/json" });
res.write(JSON.stringify({ error: { message: "forbidden" } }));

// 当客户端中断时,主动销毁连接,避免测试进程残留挂起连接
req.on("aborted", () => {
try {
res.destroy();
} catch {
// ignore
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-deterministic test teardown

The test server only destroys the hanging response on req.on("aborted"), but the abort here is driven by undici/AbortController and doesn’t reliably emit Node’s IncomingMessage aborted event in all cases. If it doesn’t fire, the socket can stay open and make the test suite hang/flap. Add a deterministic teardown (e.g. destroy on res.close/req.close, or force-destroy the socket on a short timer) so the server always releases resources even when aborted isn’t emitted.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
Line: 136:149

Comment:
**Non-deterministic test teardown**

The test server only destroys the hanging response on `req.on("aborted")`, but the abort here is driven by undici/`AbortController` and doesn’t reliably emit Node’s `IncomingMessage` `aborted` event in all cases. If it doesn’t fire, the socket can stay open and make the test suite hang/flap. Add a deterministic teardown (e.g. destroy on `res.close`/`req.close`, or force-destroy the socket on a short timer) so the server always releases resources even when `aborted` isn’t emitted.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

已逐条评估并处理 AI 审核意见(仅改动本 PR 新增/修改的代码):

  • 测试 flaky teardown(Greptile):tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts 已改为确定性回收:
    • 通过 server.on('connection') 跟踪 sockets,并在 close() 中统一 socket.destroy(),避免 server.close() 因悬挂连接卡住
    • 同时在请求侧监听 req.on('close')/req.on('aborted') 做兜底清理
  • 私有静态方法调用上下文(CodeRabbit):两处 doForward 调用已改为 doForward.call(ProxyForwarder, ...),避免丢失 this(未来若内部使用 this.* 也更稳)
  • Drizzle cast 可读性/健壮性(Greptile):src/repository/provider-endpoints.ts 已从 ${beforeDateIso}::timestamptz 调整为 CAST(${beforeDateIso} AS timestamptz),更直观且避免对“cast 与参数占位符拼接”的误解

关于 Greptile 的 “Timeout cleanup never runs” 建议:当前 timeout 是一次性定时器;若 response.text() 真因上游不结束而悬挂,定时器触发 abort 正是用于打断该悬挂,且定时器触发后不会再重复执行。为了避免引入对全局 timeout 逻辑的额外改动(超出本 PR 变更范围),这里先保持现状。

本地验证:npm run test / npm run typecheck / npm run build 均通过。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 2346 to 2348
const fallbackInit = { ...init };
delete fallbackInit.dispatcher;
try {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout not cleared on retry
When re-starting responseTimeoutId after direct fallback, the previous timer (created before the proxy attempt) may still be active. If responseTimeoutId wasn’t cleared in all proxy-failure paths, this introduces multiple timers sharing the same responseController, so a stale timer can abort a later attempt unexpectedly. Ensure you always clear any existing responseTimeoutId before assigning a new one (or ensure it’s definitively cleared on every path that reaches this block).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2346:2348

Comment:
**Timeout not cleared on retry**
When re-starting `responseTimeoutId` after direct fallback, the previous timer (created before the proxy attempt) may still be active. If `responseTimeoutId` wasn’t cleared in all proxy-failure paths, this introduces multiple timers sharing the same `responseController`, so a stale timer can abort a later attempt unexpectedly. Ensure you always clear any existing `responseTimeoutId` before assigning a new one (or ensure it’s definitively cleared on every path that reaches this block).

How can I resolve this? If you propose a fix, please make it concise.

@ding113 ding113 merged commit 2db3e7a into ding113:dev Feb 10, 2026
9 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:provider bug Something isn't working size/M Medium PR (< 500 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants