fix(billing): use last-wins for Gemini SSE usageMetadata extraction#691
Conversation
Gemini SSE streams return usageMetadata in every event, but only the final event contains complete token counts (candidatesTokenCount, thoughtsTokenCount). The existing first-wins strategy in applyUsageValue caused output tokens to be missed since early events only have promptTokenCount. This fix introduces last-wins strategy specifically for Gemini SSE usageMetadata while preserving first-wins for other formats (Claude, Codex) where usage is returned complete in a single event. Fixes: Gemini streaming responses showing 0 output tokens in billing
Summary of ChangesHello @sususu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in billing for Gemini SSE streaming responses, where output tokens were incorrectly reported as zero. The core problem stemmed from Gemini providing complete token usage metadata only in the final event of a stream, while the system was using a 'first-wins' approach. The solution implements a 'last-wins' strategy specifically for Gemini SSE's Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 Walkthrough概览该变更修改了响应处理器逻辑,为SSE事件中的Gemini usageMetadata实现后赢语义。使用延迟累积而非立即应用的方式存储在lastGeminiUsage中,确保最终用法指标来自最近有效的Gemini使用数据。 变更
代码审查工作量估计🎯 4 (复杂) | ⏱️ ~45 分钟 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a billing issue with Gemini SSE streaming by implementing a "last-wins" strategy for usageMetadata extraction. This ensures that the final, complete token counts are used, fixing the problem of zero output tokens being reported. The changes are logical and well-contained. I have one suggestion to improve code maintainability by refactoring a small piece of duplicated logic.
| // Gemini usageMetadata - 改为 last-wins 策略 | ||
| // 跳过 applyUsageValue(它是 first-wins),直接更新 | ||
| if (data.usageMetadata && typeof data.usageMetadata === "object") { | ||
| const extracted = extractUsageMetrics(data.usageMetadata); | ||
| if (extracted) { | ||
| // 持续更新,最后一个有效值会覆盖之前的 | ||
| lastGeminiUsage = extracted; | ||
| lastGeminiUsageRecord = data.usageMetadata as Record<string, unknown>; | ||
| } | ||
| } | ||
|
|
||
| // Handle response wrapping in SSE | ||
| if (!usageMetrics && data.response && typeof data.response === "object") { | ||
| const responseObj = data.response as Record<string, unknown>; | ||
| applyUsageValue(responseObj.usage, `sse.${event.event}.response.usage`); | ||
| applyUsageValue(responseObj.usageMetadata, `sse.${event.event}.response.usageMetadata`); | ||
|
|
||
| // response.usageMetadata 也使用 last-wins 策略 | ||
| if (responseObj.usageMetadata && typeof responseObj.usageMetadata === "object") { | ||
| const extracted = extractUsageMetrics(responseObj.usageMetadata); | ||
| if (extracted) { | ||
| lastGeminiUsage = extracted; | ||
| lastGeminiUsageRecord = responseObj.usageMetadata as Record<string, unknown>; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for extracting and updating lastGeminiUsage from usageMetadata is duplicated for data.usageMetadata and responseObj.usageMetadata. This can be refactored to improve code clarity and maintainability.
Specifically, the following pattern is repeated:
if (metadataSource && typeof metadataSource === 'object') {
const extracted = extractUsageMetrics(metadataSource);
if (extracted) {
lastGeminiUsage = extracted;
lastGeminiUsageRecord = metadataSource as Record<string, unknown>;
}
}Consider extracting this into a helper function within parseUsageFromResponseText to avoid repetition. For example:
const applyLastGeminiUsage = (value: unknown) => {
if (value && typeof value === 'object') {
const extracted = extractUsageMetrics(value);
if (extracted) {
lastGeminiUsage = extracted;
lastGeminiUsageRecord = value as Record<string, unknown>;
}
}
};This would make the code in the loop more concise and easier to maintain.
| // 注意:Gemini SSE 流中,usageMetadata 在每个事件中都可能存在, | ||
| // 但只有最后一个事件包含完整的 token 计数(candidatesTokenCount、thoughtsTokenCount 等) | ||
| // 因此需要持续更新,使用最后一个有效值 | ||
| if (!messageStartUsage && !messageDeltaUsage) { |
There was a problem hiding this comment.
condition !messageStartUsage && !messageDeltaUsage may prevent Gemini last-wins logic from running if a provider returns both Claude-style events AND Gemini-style usageMetadata in the same response
if a malformed response contains both message_start events and usageMetadata fields, the Gemini SSE handling (lines 1647-1671) will be skipped, causing Gemini usage data to be ignored
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1643:1643
Comment:
condition `!messageStartUsage && !messageDeltaUsage` may prevent Gemini last-wins logic from running if a provider returns both Claude-style events AND Gemini-style `usageMetadata` in the same response
if a malformed response contains both `message_start` events and `usageMetadata` fields, the Gemini SSE handling (lines 1647-1671) will be skipped, causing Gemini usage data to be ignored
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Code Review Summary
This PR introduces a targeted fix for a billing-critical bug where Gemini SSE streaming responses were incorrectly showing 0 output tokens. The implementation correctly switches from a first-wins to a last-wins strategy for Gemini's usageMetadata extraction, as Gemini streams return incomplete token counts in early events and only provide complete counts in the final event.
PR Size: XS
- Lines changed: 42 (38 additions + 4 deletions)
- Files changed: 1
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 1 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Medium Priority Issues (Should Fix)
1. [TEST-MISSING-CRITICAL] Missing test coverage for Gemini SSE streaming usage extraction
Location: src/app/v1/_lib/proxy/response-handler.ts:1563-1704
Why this is a problem: This PR fixes a billing-critical bug where Gemini SSE streams were showing 0 output tokens. However, there are no unit tests covering the new "last-wins" strategy for Gemini SSE usageMetadata extraction. The existing tests in tests/unit/proxy/extract-usage-metrics.test.ts cover:
- Gemini format (lines 311-508) - but only for single JSON responses
- SSE streaming (lines 545-605) - but only for Claude format
Per CLAUDE.md: "All new features must have unit test coverage of at least 80%"
Without tests, there's a risk of regression if future changes inadvertently break this fix.
Suggested fix:
// Add to tests/unit/proxy/extract-usage-metrics.test.ts
describe("Gemini SSE streaming usage extraction", () => {
it("should use last event's usageMetadata (last-wins strategy)", () => {
// Simulate Gemini SSE stream where early events have incomplete usage
const sseResponse = [
'data: {"usageMetadata":{"promptTokenCount":349,"totalTokenCount":349}}',
'',
'data: {"usageMetadata":{"promptTokenCount":349,"candidatesTokenCount":500,"totalTokenCount":849}}',
'',
'data: {"usageMetadata":{"promptTokenCount":349,"candidatesTokenCount":2269,"thoughtsTokenCount":346,"totalTokenCount":2964}}',
'',
].join("\n");
const result = parseUsageFromResponseText(sseResponse, "gemini");
expect(result.usageMetrics).not.toBeNull();
expect(result.usageMetrics?.input_tokens).toBe(349);
// output_tokens should be from last event: 2269 + 346 = 2615
expect(result.usageMetrics?.output_tokens).toBe(2615);
});
it("should handle response.usageMetadata wrapping in SSE", () => {
const sseResponse = [
'data: {"response":{"usageMetadata":{"promptTokenCount":100,"totalTokenCount":100}}}',
'',
'data: {"response":{"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":200,"totalTokenCount":300}}}',
'',
].join("\n");
const result = parseUsageFromResponseText(sseResponse, "gemini");
expect(result.usageMetrics).not.toBeNull();
expect(result.usageMetrics?.input_tokens).toBe(100);
expect(result.usageMetrics?.output_tokens).toBe(200);
});
});Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Missing Gemini SSE streaming tests
- Code clarity - Good
Additional Notes
The implementation is sound:
- Correctly preserves first-wins for Claude/Codex formats
- Properly implements last-wins for Gemini SSE by tracking
lastGeminiUsageseparately - Includes appropriate debug logging
- Comments accurately explain the behavior
The only concern is the lack of test coverage for this critical billing path.
Automated review by Claude AI
…691) Gemini SSE streams return usageMetadata in every event, but only the final event contains complete token counts (candidatesTokenCount, thoughtsTokenCount). The existing first-wins strategy in applyUsageValue caused output tokens to be missed since early events only have promptTokenCount. This fix introduces last-wins strategy specifically for Gemini SSE usageMetadata while preserving first-wins for other formats (Claude, Codex) where usage is returned complete in a single event. Fixes: Gemini streaming responses showing 0 output tokens in billing
Summary
Fix Gemini SSE streaming responses showing 0 output tokens in billing by implementing a last-wins strategy for
usageMetadataextraction. Gemini SSE streams returnusageMetadatain every event, but only the final event contains complete token counts (candidatesTokenCount,thoughtsTokenCount). The existing first-wins strategy caused output tokens to be missed since early events only containpromptTokenCount.Related Issues:
Problem
Gemini SSE streaming responses include
usageMetadatain every event chunk, but the token counts are incomplete until the final event:First event (incomplete):
{ "promptTokenCount": 349, "totalTokenCount": 349 // candidatesTokenCount: missing // thoughtsTokenCount: missing }Final event (complete):
{ "promptTokenCount": 349, "candidatesTokenCount": 2269, "thoughtsTokenCount": 346, "totalTokenCount": 2964 }The existing first-wins extraction strategy (appropriate for Claude/Codex where usage appears once) captured only the first incomplete event, resulting in:
Solution
Introduce a last-wins strategy specifically for Gemini SSE
usageMetadatawhile preserving first-wins for other formats:lastGeminiUsageandlastGeminiUsageRecordinstead of immediately applying valuesusageMetadataoverwrites the previous valueapplyUsageValueboth failed to find usageChanges
Core Changes:
src/app/v1/_lib/proxy/response-handler.ts(+38/-4):lastGeminiUsageandlastGeminiUsageRecordtracking variablesdata.usage(Claude/Codex compatibility)Example
Before fix (user report):
→ Result: input=349, output=0 ❌
After fix:
→ Stream ends → Use last event data
→ Result: input=349, output=2615 ✅
Testing
Manual Testing
stream: true)candidatesTokenCount + thoughtsTokenCountfrom final SSE eventExpected Behavior
message_start.usage)data.usagedirectly)Checklist
Description enhanced by Claude AI