fix(proxy): prevent stuck requesting on upstream non-ok body hang#751
fix(proxy): prevent stuck requesting on upstream non-ok body hang#751ding113 merged 6 commits intoding113:devfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough调整了 ProxyForwarder 的超时清理与回退路径控制流(包括直接回退与 HTTP/1.1 回退),修正非 OK 上游响应的超时清理顺序;对数据库时间参数序列化做兼容处理;新增单元测试覆盖上游返回非 OK 且响应体挂起的场景。 Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求解决了代理请求稳定性和数据库清理方面的关键问题。它通过调整超时处理逻辑,解决了当上游服务返回错误状态(4xx/5xx)但未能终止响应体时,代理请求可能无限期卡住的问题。此外,它修复了探针日志清理过程中日期格式不兼容的问题,确保了日志删除的可靠性。还添加了一个新的测试用例来验证代理修复。 Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
这个拉取请求修复了一个关键问题,即代理在处理上游返回非 OK 状态且响应体挂起时可能卡住。ProxyForwarder 中的核心逻辑变更——将清除响应超时的操作延迟到响应体读取之后——是正确的,并有效地解决了问题。此外,provider-endpoints.ts 中为数据库查询标准化日期格式的更改,是一个很好的健壮性和兼容性改进。然而,在代理的 URL construction logic 中发现了一个显著的 SSRF vulnerability,如果客户端提供恶意路径,这可能导致 upstream provider API keys 泄露。
| async function startServer(): Promise<{ server: Server; baseUrl: string }> { | ||
| const server = createServer((req, res) => { | ||
| // 模拟上游异常:返回 403,但永远不结束 body(导致 response.text() 无限等待) | ||
| res.writeHead(403, { "content-type": "application/json" }); | ||
| res.write(JSON.stringify({ error: { message: "forbidden" } })); | ||
|
|
||
| // 当客户端中断时,主动销毁连接,避免测试进程残留挂起连接 | ||
| req.on("aborted", () => { | ||
| try { | ||
| res.destroy(); | ||
| } catch { | ||
| // ignore | ||
| } | ||
| }); |
There was a problem hiding this comment.
Flaky test cleanup
This test’s server tries to clean up the never-ending response only on req.on("aborted"), but the client-side abort here is driven by undici’s AbortController/timeout, which won’t reliably emit IncomingMessage’s aborted event in all cases. If that event doesn’t fire, the response/socket can remain open and make the unit test suite hang/flap. Consider adding a deterministic server-side teardown (e.g., destroy the socket on a timer or on res.close/req.close) so the test process can always exit cleanly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
Line: 133:146
Comment:
**Flaky test cleanup**
This test’s server tries to clean up the never-ending response only on `req.on("aborted")`, but the client-side abort here is driven by undici’s `AbortController`/timeout, which won’t reliably emit `IncomingMessage`’s `aborted` event in all cases. If that event doesn’t fire, the response/socket can remain open and make the unit test suite hang/flap. Consider adding a deterministic server-side teardown (e.g., destroy the socket on a timer or on `res.close`/`req.close`) so the test process can always exit cleanly.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/repository/provider-endpoints.ts`:
- Around line 274-282: The comment above the beforeDateIso conversion contains
an emoji (⚠️) which violates the no-emoji rule; update the comment near the
input.beforeDate.toISOString() and the SQL block (referencing beforeDateIso,
db.execute, sql`...`, provider_endpoint_probe_logs and created_at) to remove the
emoji and rephrase the explanatory text in plain ASCII (e.g., "Note:" or
"Warning:") while keeping the existing explanation about runtime drivers
serializing Date and the justification for using ISO-8601 + ::timestamptz; do
not change the implementation (beforeDateIso or the SQL) — only edit the comment
text to eliminate emoji characters.
In `@tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts`:
- Around line 33-87: The factory createProvider returns a value typed as
Provider but omits three required fields from the Provider interface; add
anthropicMaxTokensPreference, anthropicThinkingBudgetPreference, and
geminiGoogleSearchPreference to the returned default object (set each to null)
so the object conforms to Provider and TypeScript compiles; update the
createProvider(...) return object to include these three keys with null
defaults.
🧹 Nitpick comments (1)
tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts (1)
178-182: 访问私有方法doForward的方式较脆弱,建议添加注释说明原因。通过双重类型断言
ProxyForwarder as unknown as { doForward: ... }访问私有静态方法,如果方法签名变更会静默失败。考虑添加一行注释说明为何需要直接调用私有方法(例如:"直接测试 doForward 以隔离单次转发行为,避免 send() 的重试/供应商切换逻辑干扰")。
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR. The core fix correctly addresses a real production hang scenario where upstream servers return 4xx/5xx status codes but never close the response body, causing response.text() to block indefinitely. The solution of deferring clearTimeout to a finally block after body reading is sound and well-tested.
PR Size: M
- Lines changed: 239 (230 additions, 9 deletions)
- Files changed: 4
Review Coverage
- Logic and correctness - Clean. The
try { throw await ... } finally { clearTimeout }pattern correctly keeps the response timeout active during body reading, then cleans up regardless of outcome. - Security (OWASP Top 10) - Clean. SQL change uses parameterized queries via Drizzle
sqltemplate literal; no injection risk. - Error handling - Clean.
ProxyError.fromUpstreamResponsealready handlesresponse.text()rejection internally (errors.ts:50-54), so timeout-induced abort errors are gracefully converted to a ProxyError with the correct status code. - Type safety - Clean. No
anyusage in new code; type assertions in test file are necessary for accessing private static method. - Documentation accuracy - Clean. Comments accurately describe the problem and the fix rationale.
- Test coverage - Adequate. New test (
proxy-forwarder-nonok-body-hang.test.ts) directly reproduces the hang scenario with a mock server that sends headers but never closes the body, and verifies the request terminates with a ProxyError(403) within the timeout window. A 2-second safety timeout prevents test suite hangs on regression. - Code clarity - Good. The ancillary fix in
provider-endpoints.ts(Date -> ISO-8601 string with explicit::timestamptzcast) is a defensive improvement for cross-runtime compatibility.
Automated review by Claude AI
| try { | ||
| throw await ProxyError.fromUpstreamResponse(response, { | ||
| id: provider.id, | ||
| name: provider.name, | ||
| }); | ||
| } finally { | ||
| if (responseTimeoutId) { | ||
| clearTimeout(responseTimeoutId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Timeout timer not cleared
In the !response.ok branch, clearTimeout(responseTimeoutId) only runs if ProxyError.fromUpstreamResponse() completes. If that call hangs (the exact scenario this PR fixes), the finally never executes, so the timer is left running and may fire later during subsequent retries/requests, aborting via the shared responseController and producing confusing logs/behavior. This should clear the timer once the timeout fires (e.g., inside the setTimeout callback) so it can’t linger past the request lifecycle.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2453:2462
Comment:
**Timeout timer not cleared**
In the `!response.ok` branch, `clearTimeout(responseTimeoutId)` only runs if `ProxyError.fromUpstreamResponse()` completes. If that call hangs (the exact scenario this PR fixes), the `finally` never executes, so the timer is left running and may fire later during subsequent retries/requests, aborting via the shared `responseController` and producing confusing logs/behavior. This should clear the timer once the timeout fires (e.g., inside the `setTimeout` callback) so it can’t linger past the request lifecycle.
How can I resolve this? If you propose a fix, please make it concise.| const result = await db.execute(sql` | ||
| WITH ids_to_delete AS ( | ||
| SELECT id FROM provider_endpoint_probe_logs | ||
| WHERE created_at < ${input.beforeDate} | ||
| WHERE created_at < ${beforeDateIso}::timestamptz | ||
| ORDER BY created_at ASC |
There was a problem hiding this comment.
Raw SQL cast usage
WHERE created_at < ${beforeDateIso}::timestamptz relies on the template interpolation to produce a single SQL parameter followed by a cast. With some SQL builders this can get split such that the cast is treated as text, yielding invalid SQL or skipping the cast. Prefer applying the cast on the SQL side (e.g., sql${beforeDateIso}::timestamptz`` or using a typed parameter) to ensure the cast is part of the SQL fragment, not the bound value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 279:283
Comment:
**Raw SQL cast usage**
`WHERE created_at < ${beforeDateIso}::timestamptz` relies on the template interpolation to produce a single SQL parameter followed by a cast. With some SQL builders this can get split such that the cast is treated as text, yielding invalid SQL or skipping the cast. Prefer applying the cast on the SQL side (e.g., `sql`${beforeDateIso}::timestamptz`` or using a typed parameter) to ensure the cast is part of the SQL fragment, not the bound value.
How can I resolve this? If you propose a fix, please make it concise.0f5c66e to
1a600db
Compare
| const result = await db.execute(sql` | ||
| WITH ids_to_delete AS ( | ||
| SELECT id FROM provider_endpoint_probe_logs | ||
| WHERE created_at < ${input.beforeDate} | ||
| WHERE created_at < ${beforeDateIso}::timestamptz | ||
| ORDER BY created_at ASC |
There was a problem hiding this comment.
Invalid SQL parameter cast
WHERE created_at < ${beforeDateIso}::timestamptz is not valid Drizzle SQL interpolation: the ::timestamptz ends up outside the parameter placeholder, which produces invalid SQL / skips the cast depending on the driver. You need to include the cast in the sql fragment (cast the parameter inside the template), otherwise this cleanup query can fail at runtime.
Also appears in this function only (no other occurrences found).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 378:382
Comment:
**Invalid SQL parameter cast**
`WHERE created_at < ${beforeDateIso}::timestamptz` is not valid Drizzle SQL interpolation: the `::timestamptz` ends up outside the parameter placeholder, which produces invalid SQL / skips the cast depending on the driver. You need to include the cast in the `sql` fragment (cast the parameter inside the template), otherwise this cleanup query can fail at runtime.
Also appears in this function only (no other occurrences found).
How can I resolve this? If you propose a fix, please make it concise.| try { | ||
| throw await ProxyError.fromUpstreamResponse(response, { | ||
| id: provider.id, | ||
| name: provider.name, | ||
| }); | ||
| } finally { | ||
| if (responseTimeoutId) { | ||
| clearTimeout(responseTimeoutId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Leaked timeout after abort
In the !response.ok path you await ProxyError.fromUpstreamResponse(response, ...) inside a try/finally and only clear responseTimeoutId in finally. If the timeout fires while response.text() is pending, the abort will happen but response.text() may still not resolve promptly, so the finally might not run until much later (or ever if undici never settles), leaving responseTimeoutId live past the request lifecycle. That can cause the timer to fire later and abort via the same responseController, affecting subsequent retries/requests and producing misleading logs. Clearing the timer when it fires (inside the setTimeout callback) avoids this lingering timer.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2453:2462
Comment:
**Leaked timeout after abort**
In the `!response.ok` path you `await ProxyError.fromUpstreamResponse(response, ...)` inside a `try/finally` and only clear `responseTimeoutId` in `finally`. If the timeout fires while `response.text()` is pending, the abort will happen but `response.text()` may still not resolve promptly, so the `finally` might not run until much later (or ever if undici never settles), leaving `responseTimeoutId` live past the request lifecycle. That can cause the timer to fire later and abort via the same `responseController`, affecting subsequent retries/requests and producing misleading logs. Clearing the timer when it fires (inside the `setTimeout` callback) avoids this lingering timer.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| // 移�� content-encoding 和 content-length(避免下游再解压或使用错误长度) | ||
| // 移除 content-encoding 和 content-length(避免下游再解压或使用错误长度) | ||
| responseHeaders.delete("content-encoding"); | ||
| responseHeaders.delete("content-length"); | ||
| } else { |
There was a problem hiding this comment.
Garbled comment encoding
This comment appears to contain mojibake (// 移�� content-encoding ...). It’s likely an encoding/copy artifact introduced in this PR and should be fixed to readable text to avoid confusing future maintainers.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2832:2836
Comment:
**Garbled comment encoding**
This comment appears to contain mojibake (`// 移�� content-encoding ...`). It’s likely an encoding/copy artifact introduced in this PR and should be fixed to readable text to avoid confusing future maintainers.
How can I resolve this? If you propose a fix, please make it concise.|
已根据 CodeRabbit 的变更请求更新:
本地验证:npm run test / npm run typecheck / npm run build 均通过。 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts`:
- Around line 182-186: The test extracts the private static method by casting
and calling it as a bare function, which loses class context; change both usages
where you grab doForward (the one using ProxyForwarder as unknown ... .doForward
around the session/provider/baseUrl calls and the second occurrence later) to
invoke the method with the class bound as this (e.g., call or bind the function
with ProxyForwarder as the thisArg) so that inside doForward any references to
this (static helpers) remain valid; locate the symbol ProxyForwarder and its
doForward reference and replace the direct invocation with a bound/call
invocation preserving the ProxyForwarder context for both occurrences.
🧹 Nitpick comments (3)
tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts (3)
167-260: 两个测试用例结构高度重复,可考虑提取共享逻辑。
test 1(Line 168-212)与test 2(Line 214-260)的Promise.race+ timeout 守卫 +ProxyError断言 + server cleanup 逻辑几乎完全一致,仅createProvider的 overrides 不同。可以抽取一个辅助函数(如runForwardAndExpect403)来减少重复,提高可维护性。示例重构思路
async function runForwardAndExpect403(providerOverrides: Partial<Provider>) { const { server, baseUrl } = await startServer(); const clientAbortController = new AbortController(); try { const provider = createProvider({ url: baseUrl, ...providerOverrides }); const session = createSession({ clientAbortSignal: clientAbortController.signal }); session.setProvider(provider); const doForward = ( ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown } ).doForward.bind(ProxyForwarder); const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>; const result = await Promise.race([ forwardPromise.then( () => ({ type: "resolved" as const }), (error) => ({ type: "rejected" as const, error }), ), new Promise<{ type: "timeout" }>((resolve) => setTimeout(() => resolve({ type: "timeout" as const }), 2_000), ), ]); if (result.type === "timeout") { clientAbortController.abort(new Error("test_timeout")); throw new Error("doForward timed out — possible non-ok body hang regression"); } expect(result.type).toBe("rejected"); expect(result.type === "rejected" ? result.error : null).toBeInstanceOf(ProxyError); const err = (result as { type: "rejected"; error: unknown }).error as ProxyError; expect(err.statusCode).toBe(403); } finally { await new Promise<void>((resolve) => server.close(() => resolve())); } }
92-134:createSession通过Object.create+Object.assign绕过构造函数创建 mock session——可行但脆弱。这种方式能工作,但当
ProxySession的构造函数逻辑或内部字段变更时,mock 不会同步更新,可能导致测试静默通过但实际行为不一致。建议在文件顶部添加简短注释说明此模式的局限性,方便后续维护者理解。
136-165:req.on("aborted")在 Node.js ≥ 16.12 / v17+ 中已被弃用,建议改用close事件。
IncomingMessage的aborted事件已标记为 deprecated(DEP0156),在较新的 Node.js 版本中不再保证触发。建议改为监听close事件并配合req.destroyed检查来清理连接。虽然finally中的server.close()能作为兜底,但在使用较新 Node 版本的 CI 环境上,不及时销毁连接可能导致资源泄漏。建议的修复
- req.on("aborted", () => { - try { - res.destroy(); - } catch { - // ignore - } - }); + req.on("close", () => { + if (req.destroyed) { + try { + res.destroy(); + } catch { + // ignore + } + } + });
| const doForward = ( | ||
| ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown } | ||
| ).doForward; | ||
|
|
||
| const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>; |
There was a problem hiding this comment.
通过 as unknown 绕过类型系统访问 private static doForward 会丢失 this 上下文。
const doForward = (ProxyForwarder as unknown as ...).doForward 提取了静态方法的引用,随后以裸函数方式调用 doForward(session, provider, baseUrl)。如果 doForward 内部通过 this 调用其他静态方法(如 this.someHelper()),则在 strict mode 下 this 为 undefined,会导致运行时异常。
更安全的写法是使用 .call 或 .bind 保留类上下文:
建议的修复
- const doForward = (
- ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }
- ).doForward;
-
- const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>;
+ const doForward = (
+ ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }
+ ).doForward.bind(ProxyForwarder);
+
+ const forwardPromise = doForward(session, provider, baseUrl) as Promise<Response>;两处调用(Line 182-186 和 Line 230-234)均需同步修改。
Also applies to: 230-234
🤖 Prompt for AI Agents
In `@tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts` around lines 182 -
186, The test extracts the private static method by casting and calling it as a
bare function, which loses class context; change both usages where you grab
doForward (the one using ProxyForwarder as unknown ... .doForward around the
session/provider/baseUrl calls and the second occurrence later) to invoke the
method with the class bound as this (e.g., call or bind the function with
ProxyForwarder as the thisArg) so that inside doForward any references to this
(static helpers) remain valid; locate the symbol ProxyForwarder and its
doForward reference and replace the direct invocation with a bound/call
invocation preserving the ProxyForwarder context for both occurrences.
| const result = await db.execute(sql` | ||
| WITH ids_to_delete AS ( | ||
| SELECT id FROM provider_endpoint_probe_logs | ||
| WHERE created_at < ${input.beforeDate} | ||
| WHERE created_at < ${beforeDateIso}::timestamptz | ||
| ORDER BY created_at ASC |
There was a problem hiding this comment.
Broken timestamptz cast
WHERE created_at < ${beforeDateIso}::timestamptz is not valid Drizzle interpolation: the ::timestamptz ends up outside the parameter placeholder, which can generate invalid SQL at runtime (or skip the cast depending on driver). Wrap the cast inside the sql fragment (e.g. WHERE created_at < (${sql${beforeDateIso}})::timestamptz or bind a Date/typed parameter in a way Drizzle will cast correctly) so the cast is part of the SQL fragment rather than concatenated after a bound value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 378:382
Comment:
**Broken timestamptz cast**
`WHERE created_at < ${beforeDateIso}::timestamptz` is not valid Drizzle interpolation: the `::timestamptz` ends up outside the parameter placeholder, which can generate invalid SQL at runtime (or skip the cast depending on driver). Wrap the cast inside the `sql` fragment (e.g. `WHERE created_at < (${sql`${beforeDateIso}`})::timestamptz` or bind a `Date`/typed parameter in a way Drizzle will cast correctly) so the cast is part of the SQL fragment rather than concatenated after a bound value.
How can I resolve this? If you propose a fix, please make it concise.| if (!response.ok) { | ||
| // HTTP 错误:清除响应超时定时器 | ||
| if (responseTimeoutId) { | ||
| clearTimeout(responseTimeoutId); | ||
| // ⚠️ HTTP 错误:不要在读取响应体之前清除响应超时定时器 | ||
| // 原因:某些上游会在返回 4xx/5xx 后“卡住不结束 body”, | ||
| // 若提前 clearTimeout,会导致 ProxyError.fromUpstreamResponse() 的 response.text() 无限等待, | ||
| // 从而让整条请求链路(含客户端)悬挂,前端表现为一直“请求中”。 | ||
| // | ||
| // 正确策略:保留 response timeout 继续监控 body 读取,并在 finally 里清理定时器。 | ||
| try { | ||
| throw await ProxyError.fromUpstreamResponse(response, { | ||
| id: provider.id, | ||
| name: provider.name, | ||
| }); | ||
| } finally { | ||
| if (responseTimeoutId) { | ||
| clearTimeout(responseTimeoutId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Timeout cleanup never runs
In the !response.ok path you rely on the finally to clearTimeout(responseTimeoutId), but the whole point of this change is that ProxyError.fromUpstreamResponse() can hang on response.text(). If that promise never settles (even after responseController.abort()), the finally won’t execute and the timer remains live past the request lifecycle, potentially firing later and aborting via the same responseController. Consider clearing the timeout when it fires (inside the setTimeout callback) so it can’t linger if the awaited body read never resolves.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2467:2483
Comment:
**Timeout cleanup never runs**
In the `!response.ok` path you rely on the `finally` to `clearTimeout(responseTimeoutId)`, but the whole point of this change is that `ProxyError.fromUpstreamResponse()` can hang on `response.text()`. If that promise never settles (even after `responseController.abort()`), the `finally` won’t execute and the timer remains live past the request lifecycle, potentially firing later and aborting via the same `responseController`. Consider clearing the timeout when it fires (inside the `setTimeout` callback) so it can’t linger if the awaited body read never resolves.
How can I resolve this? If you propose a fix, please make it concise.| async function startServer(): Promise<{ server: Server; baseUrl: string }> { | ||
| const server = createServer((req, res) => { | ||
| // 模拟上游异常:返回 403,但永远不结束 body(导致 response.text() 无限等待) | ||
| res.writeHead(403, { "content-type": "application/json" }); | ||
| res.write(JSON.stringify({ error: { message: "forbidden" } })); | ||
|
|
||
| // 当客户端中断时,主动销毁连接,避免测试进程残留挂起连接 | ||
| req.on("aborted", () => { | ||
| try { | ||
| res.destroy(); | ||
| } catch { | ||
| // ignore | ||
| } | ||
| }); |
There was a problem hiding this comment.
Non-deterministic test teardown
The test server only destroys the hanging response on req.on("aborted"), but the abort here is driven by undici/AbortController and doesn’t reliably emit Node’s IncomingMessage aborted event in all cases. If it doesn’t fire, the socket can stay open and make the test suite hang/flap. Add a deterministic teardown (e.g. destroy on res.close/req.close, or force-destroy the socket on a short timer) so the server always releases resources even when aborted isn’t emitted.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/proxy/proxy-forwarder-nonok-body-hang.test.ts
Line: 136:149
Comment:
**Non-deterministic test teardown**
The test server only destroys the hanging response on `req.on("aborted")`, but the abort here is driven by undici/`AbortController` and doesn’t reliably emit Node’s `IncomingMessage` `aborted` event in all cases. If it doesn’t fire, the socket can stay open and make the test suite hang/flap. Add a deterministic teardown (e.g. destroy on `res.close`/`req.close`, or force-destroy the socket on a short timer) so the server always releases resources even when `aborted` isn’t emitted.
How can I resolve this? If you propose a fix, please make it concise.|
已逐条评估并处理 AI 审核意见(仅改动本 PR 新增/修改的代码):
关于 Greptile 的 “Timeout cleanup never runs” 建议:当前 timeout 是一次性定时器;若 本地验证: |
| const fallbackInit = { ...init }; | ||
| delete fallbackInit.dispatcher; | ||
| try { |
There was a problem hiding this comment.
Timeout not cleared on retry
When re-starting responseTimeoutId after direct fallback, the previous timer (created before the proxy attempt) may still be active. If responseTimeoutId wasn’t cleared in all proxy-failure paths, this introduces multiple timers sharing the same responseController, so a stale timer can abort a later attempt unexpectedly. Ensure you always clear any existing responseTimeoutId before assigning a new one (or ensure it’s definitively cleared on every path that reaches this block).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 2346:2348
Comment:
**Timeout not cleared on retry**
When re-starting `responseTimeoutId` after direct fallback, the previous timer (created before the proxy attempt) may still be active. If `responseTimeoutId` wasn’t cleared in all proxy-failure paths, this introduces multiple timers sharing the same `responseController`, so a stale timer can abort a later attempt unexpectedly. Ensure you always clear any existing `responseTimeoutId` before assigning a new one (or ensure it’s definitively cleared on every path that reaches this block).
How can I resolve this? If you propose a fix, please make it concise.
概要
修复一个会导致代理请求长期卡在 requesting 的问题:当上游返回 4xx/5xx,但响应 body 永远不结束时,服务端在读取错误体阶段会无限等待,从而让整条请求链路(含客户端)一直悬挂(重启进程后暂时恢复)。
同时修复探针日志清理任务在部分运行时/驱动组合下的
timestamptz解析失败问题,避免清理失败导致日志持续堆积。问题与根因
在
ProxyForwarder.doForward()中:response.text()来构造ProxyError;response.text()会无限等待;另外,在代理失败降级到直连(
proxyFallbackToDirect)或 HTTP/2 -> HTTP/1.1 回退时,catch分支会清除 response timeout。若随后直连/回退请求拿到非 2xx 且 body 仍不结束,会再次进入同样的“无 timeout 读取错误体”路径,从而复现 hang。探针日志清理方面,直接把
Date作为 SQL 参数传入时,部分运行时/驱动会把它序列化成类似"Mon Feb ... GMT+0800 (China Standard Time)"的非 ISO 字符串,Postgres 无法解析为timestamptz,导致清理任务报错并无法继续清理。解决方案
try/finally包裹ProxyError.fromUpstreamResponse(),在finally中统一清理超时定时器;response.text(),避免整条链路永久悬挂。timestamptz兼容性修复beforeDate统一转为toISOString();::timestamptzcast,避免 Postgres 解析歧义/失败。变更点
src/app/v1/_lib/proxy/forwarder.tsresponse.text()悬挂)src/repository/provider-endpoints.tsbeforeDate改用 ISO-8601 字符串 +::timestamptztests/unit/proxy/proxy-forwarder-nonok-body-hang.test.tsProxyError,不会挂死proxyFallbackToDirect降级场景测试
npm run testnpm run typechecknpm run buildGreptile Overview
Greptile Summary
该 PR 修复代理转发在上游返回非 2xx 且响应 body 永不结束时,服务端在读取错误体阶段可能永久等待的问题:通过保证 response timeout 覆盖
ProxyError.fromUpstreamResponse()的错误体读取,并在 fallback(proxy->direct / h2->h1)后恢复 timeout,避免请求链路长期卡在 requesting。另外,探针日志清理任务改为用 ISO-8601 字符串传参并显式
timestamptzcast,解决部分运行时/驱动把Date序列化为非 ISO 字符串导致 Postgres 解析失败、清理任务中断的问题。Confidence Score: 4/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant Forwarder as ProxyForwarder.doForward() participant Upstream participant Timeout as responseTimeout Client->>Forwarder: request (with responseController) Forwarder->>Timeout: start response timeout Forwarder->>Upstream: fetch/forward request Upstream-->>Forwarder: response (status 4xx/5xx) alt response.ok Forwarder-->>Client: stream response else !response.ok Forwarder->>Forwarder: ProxyError.fromUpstreamResponse() Note over Forwarder,Upstream: error body may never end Timeout-->>Forwarder: timer fires Forwarder->>Forwarder: responseController.abort() Forwarder-->>Client: throw ProxyError (timeout-protected) Forwarder->>Timeout: clear timeout end opt proxyFallbackToDirect / h2->h1 fallback Forwarder->>Timeout: restart response timeout Forwarder->>Upstream: retry request end