fix(billing): use last-wins for Gemini SSE usageMetadata extraction by sususu98 · Pull Request #691 · ding113/claude-code-hub

sususu98 · 2026-01-31T14:28:51Z

Summary

Fix Gemini SSE streaming responses showing 0 output tokens in billing by implementing a last-wins strategy for usageMetadata extraction. Gemini SSE streams return usageMetadata in every event, but only the final event contains complete token counts (candidatesTokenCount, thoughtsTokenCount). The existing first-wins strategy caused output tokens to be missed since early events only contain promptTokenCount.

Related Issues:

Related to fix(billing): 修复 Gemini 图片生成模型的 IMAGE modality token 计费问题 #664 - Part of ongoing Gemini billing accuracy improvements (IMAGE modality tokens)
Related to Fix/gemini thoughts token support #327 - Continues Gemini token extraction enhancements (thoughts token support)
Related to fix(billing): 修复 Gemini 缓存 token 重复计费问题 #338 - Follows similar pattern for Gemini-specific usage parsing (cache token deduplication)
Follow-up to fix: 修复 Gemini 和 OpenAI Chat Completions 流式响应的 usage 解析问题 #153 - Extends SSE usage parsing improvements

Problem

Gemini SSE streaming responses include usageMetadata in every event chunk, but the token counts are incomplete until the final event:

First event (incomplete):

{
  "promptTokenCount": 349,
  "totalTokenCount": 349
  // candidatesTokenCount: missing
  // thoughtsTokenCount: missing
}

Final event (complete):

{
  "promptTokenCount": 349,
  "candidatesTokenCount": 2269,
  "thoughtsTokenCount": 346,
  "totalTokenCount": 2964
}

The existing first-wins extraction strategy (appropriate for Claude/Codex where usage appears once) captured only the first incomplete event, resulting in:

Input: 349 tokens ✅ (correct)
Output: 0 tokens ❌ (should be 2269 + 346 = 2615)

Solution

Introduce a last-wins strategy specifically for Gemini SSE usageMetadata while preserving first-wins for other formats:

Track Gemini usage separately: Store lastGeminiUsage and lastGeminiUsageRecord instead of immediately applying values
Continuous updates: Each SSE event with usageMetadata overwrites the previous value
Final application: After SSE stream ends, use the last recorded value (which contains complete counts)
Backward compatibility: Only apply Gemini last-wins when Claude SSE usage and standard applyUsageValue both failed to find usage

Changes

Core Changes:

src/app/v1/_lib/proxy/response-handler.ts (+38/-4):
- Added lastGeminiUsage and lastGeminiUsageRecord tracking variables
- Modified SSE event loop to update Gemini usage on each event instead of first-only
- Applied last Gemini usage after stream completion if no other usage found
- Preserved first-wins behavior for data.usage (Claude/Codex compatibility)

Example

Before fix (user report):

Event 1:  promptTokenCount: 349, totalTokenCount: 349 (used by first-wins)
Event 2:  promptTokenCount: 349, candidatesTokenCount: 200, thoughtsTokenCount: 50
...
Event N:  promptTokenCount: 349, candidatesTokenCount: 2269, thoughtsTokenCount: 346

→ Result: input=349, output=0 ❌

After fix:

Event 1:  promptTokenCount: 349 (stored, not applied)
Event 2:  promptTokenCount: 349, candidatesTokenCount: 200, thoughtsTokenCount: 50 (stored, overwrites)
...
Event N:  promptTokenCount: 349, candidatesTokenCount: 2269, thoughtsTokenCount: 346 (stored, overwrites)

→ Stream ends → Use last event data
→ Result: input=349, output=2615 ✅

Testing

Manual Testing

Configure a Gemini provider (gemini-2.5-flash, gemini-2.5-pro)
Send a streaming request (stream: true)
Check usage logs/billing records
Verify output tokens correctly reflect candidatesTokenCount + thoughtsTokenCount from final SSE event

Expected Behavior

✅ Gemini SSE requests show correct output token counts
✅ Claude SSE requests unchanged (use first-wins for message_start.usage)
✅ Non-SSE requests unchanged (use data.usage directly)
✅ Backward compatible with existing usage extraction logic

Checklist

Code follows project conventions
Self-review completed
Logic validated against real Gemini SSE response structure
Preserves existing behavior for non-Gemini providers

Description enhanced by Claude AI

Gemini SSE streams return usageMetadata in every event, but only the final event contains complete token counts (candidatesTokenCount, thoughtsTokenCount). The existing first-wins strategy in applyUsageValue caused output tokens to be missed since early events only have promptTokenCount. This fix introduces last-wins strategy specifically for Gemini SSE usageMetadata while preserving first-wins for other formats (Claude, Codex) where usage is returned complete in a single event. Fixes: Gemini streaming responses showing 0 output tokens in billing

gemini-code-assist · 2026-01-31T14:29:04Z

Summary of Changes

Hello @sususu98, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in billing for Gemini SSE streaming responses, where output tokens were incorrectly reported as zero. The core problem stemmed from Gemini providing complete token usage metadata only in the final event of a stream, while the system was using a 'first-wins' approach. The solution implements a 'last-wins' strategy specifically for Gemini SSE's usageMetadata to ensure that the full and accurate token counts from the last event are captured, thereby correcting billing discrepancies. This change is isolated to Gemini SSE processing, preserving the existing logic for other providers.

Highlights

Billing Accuracy Fix: Resolves an issue where Gemini Server-Sent Events (SSE) streaming responses incorrectly reported 0 output tokens for billing purposes.
Last-Wins Strategy for Gemini SSE: Introduces a 'last-wins' strategy specifically for extracting usageMetadata from Gemini SSE streams, as complete token counts are only available in the final event.
Preservation of Existing Logic: Maintains the 'first-wins' strategy for other providers like Claude and Codex, where usage information is complete in a single event.
New State Variables: Adds lastGeminiUsage and lastGeminiUsageRecord to store the most recent usageMetadata encountered during Gemini SSE processing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-31T14:29:10Z

📝 Walkthrough

概览

该变更修改了响应处理器逻辑，为SSE事件中的Gemini usageMetadata实现后赢语义。使用延迟累积而非立即应用的方式存储在lastGeminiUsage中，确保最终用法指标来自最近有效的Gemini使用数据。

变更

群组/文件(s)	总结
Gemini SSE事件处理 `src/app/v1/_lib/proxy/response-handler.ts`	引入lastGeminiUsage和lastGeminiUsageRecord变量实现后赢语义。调整message_start和message_delta事件解析流程，推迟Gemini usageMetadata的应用。更新最终用法解析路径以优先使用最新的Gemini指标。在流式和非流式处理路径中应用相同变更。

代码审查工作量估计

🎯 4 (复杂) | ⏱️ ~45 分钟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	标题准确总结了主要变更：为Gemini SSE usageMetadata提取采用last-wins策略，与changeset中的核心修改完全一致。
Description check	✅ Passed	PR描述详细说明了问题、解决方案和具体示例，与代码改动相关联，描述了Gemini SSE流式响应中usageMetadata的last-wins策略实现。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly addresses a billing issue with Gemini SSE streaming by implementing a "last-wins" strategy for usageMetadata extraction. This ensures that the final, complete token counts are used, fixing the problem of zero output tokens being reported. The changes are logical and well-contained. I have one suggestion to improve code maintainability by refactoring a small piece of duplicated logic.

gemini-code-assist · 2026-01-31T14:30:15Z

src/app/v1/_lib/proxy/response-handler.ts

+        // Gemini usageMetadata - 改为 last-wins 策略
+        // 跳过 applyUsageValue（它是 first-wins），直接更新
+        if (data.usageMetadata && typeof data.usageMetadata === "object") {
+          const extracted = extractUsageMetrics(data.usageMetadata);
+          if (extracted) {
+            // 持续更新，最后一个有效值会覆盖之前的
+            lastGeminiUsage = extracted;
+            lastGeminiUsageRecord = data.usageMetadata as Record<string, unknown>;
+          }
+        }

        // Handle response wrapping in SSE
        if (!usageMetrics && data.response && typeof data.response === "object") {
          const responseObj = data.response as Record<string, unknown>;
          applyUsageValue(responseObj.usage, `sse.${event.event}.response.usage`);
-          applyUsageValue(responseObj.usageMetadata, `sse.${event.event}.response.usageMetadata`);
+
+          // response.usageMetadata 也使用 last-wins 策略
+          if (responseObj.usageMetadata && typeof responseObj.usageMetadata === "object") {
+            const extracted = extractUsageMetrics(responseObj.usageMetadata);
+            if (extracted) {
+              lastGeminiUsage = extracted;
+              lastGeminiUsageRecord = responseObj.usageMetadata as Record<string, unknown>;
+            }
+          }
        }


The logic for extracting and updating lastGeminiUsage from usageMetadata is duplicated for data.usageMetadata and responseObj.usageMetadata. This can be refactored to improve code clarity and maintainability.

Specifically, the following pattern is repeated:

if (metadataSource && typeof metadataSource === 'object') { const extracted = extractUsageMetrics(metadataSource); if (extracted) { lastGeminiUsage = extracted; lastGeminiUsageRecord = metadataSource as Record<string, unknown>; } }

Consider extracting this into a helper function within parseUsageFromResponseText to avoid repetition. For example:

const applyLastGeminiUsage = (value: unknown) => { if (value && typeof value === 'object') { const extracted = extractUsageMetrics(value); if (extracted) { lastGeminiUsage = extracted; lastGeminiUsageRecord = value as Record<string, unknown>; } } };

This would make the code in the loop more concise and easier to maintain.

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-31T14:32:08Z

src/app/v1/_lib/proxy/response-handler.ts

+      // 注意：Gemini SSE 流中，usageMetadata 在每个事件中都可能存在，
+      // 但只有最后一个事件包含完整的 token 计数（candidatesTokenCount、thoughtsTokenCount 等）
+      // 因此需要持续更新，使用最后一个有效值
      if (!messageStartUsage && !messageDeltaUsage) {


condition !messageStartUsage && !messageDeltaUsage may prevent Gemini last-wins logic from running if a provider returns both Claude-style events AND Gemini-style usageMetadata in the same response

if a malformed response contains both message_start events and usageMetadata fields, the Gemini SSE handling (lines 1647-1671) will be skipped, causing Gemini usage data to be ignored

Prompt To Fix With AI

This is a comment left during a code review. Path: src/app/v1/_lib/proxy/response-handler.ts Line: 1643:1643 Comment: condition `!messageStartUsage && !messageDeltaUsage` may prevent Gemini last-wins logic from running if a provider returns both Claude-style events AND Gemini-style `usageMetadata` in the same response if a malformed response contains both `message_start` events and `usageMetadata` fields, the Gemini SSE handling (lines 1647-1671) will be skipped, causing Gemini usage data to be ignored How can I resolve this? If you propose a fix, please make it concise.

github-actions

Code Review Summary

This PR introduces a targeted fix for a billing-critical bug where Gemini SSE streaming responses were incorrectly showing 0 output tokens. The implementation correctly switches from a first-wins to a last-wins strategy for Gemini's usageMetadata extraction, as Gemini streams return incomplete token counts in early events and only provide complete counts in the final event.

PR Size: XS

Lines changed: 42 (38 additions + 4 deletions)
Files changed: 1

Issues Found

Category	Critical	High	Medium	Low
Logic/Bugs	0	0	0	0
Security	0	0	0	0
Error Handling	0	0	0	0
Types	0	0	0	0
Comments/Docs	0	0	0	0
Tests	0	0	1	0
Simplification	0	0	0	0

Medium Priority Issues (Should Fix)

1. [TEST-MISSING-CRITICAL] Missing test coverage for Gemini SSE streaming usage extraction

Location: src/app/v1/_lib/proxy/response-handler.ts:1563-1704

Why this is a problem: This PR fixes a billing-critical bug where Gemini SSE streams were showing 0 output tokens. However, there are no unit tests covering the new "last-wins" strategy for Gemini SSE usageMetadata extraction. The existing tests in tests/unit/proxy/extract-usage-metrics.test.ts cover:

Gemini format (lines 311-508) - but only for single JSON responses
SSE streaming (lines 545-605) - but only for Claude format

Per CLAUDE.md: "All new features must have unit test coverage of at least 80%"

Without tests, there's a risk of regression if future changes inadvertently break this fix.

Suggested fix:

// Add to tests/unit/proxy/extract-usage-metrics.test.ts
describe("Gemini SSE streaming usage extraction", () => {
  it("should use last event's usageMetadata (last-wins strategy)", () => {
    // Simulate Gemini SSE stream where early events have incomplete usage
    const sseResponse = [
      'data: {"usageMetadata":{"promptTokenCount":349,"totalTokenCount":349}}',
      '',
      'data: {"usageMetadata":{"promptTokenCount":349,"candidatesTokenCount":500,"totalTokenCount":849}}',
      '',
      'data: {"usageMetadata":{"promptTokenCount":349,"candidatesTokenCount":2269,"thoughtsTokenCount":346,"totalTokenCount":2964}}',
      '',
    ].join("\n");

    const result = parseUsageFromResponseText(sseResponse, "gemini");

    expect(result.usageMetrics).not.toBeNull();
    expect(result.usageMetrics?.input_tokens).toBe(349);
    // output_tokens should be from last event: 2269 + 346 = 2615
    expect(result.usageMetrics?.output_tokens).toBe(2615);
  });

  it("should handle response.usageMetadata wrapping in SSE", () => {
    const sseResponse = [
      'data: {"response":{"usageMetadata":{"promptTokenCount":100,"totalTokenCount":100}}}',
      '',
      'data: {"response":{"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":200,"totalTokenCount":300}}}',
      '',
    ].join("\n");

    const result = parseUsageFromResponseText(sseResponse, "gemini");

    expect(result.usageMetrics).not.toBeNull();
    expect(result.usageMetrics?.input_tokens).toBe(100);
    expect(result.usageMetrics?.output_tokens).toBe(200);
  });
});

Review Coverage

Logic and correctness - Clean
Security (OWASP Top 10) - Clean
Error handling - Clean
Type safety - Clean
Documentation accuracy - Clean
Test coverage - Missing Gemini SSE streaming tests
Code clarity - Good

Additional Notes

The implementation is sound:

Correctly preserves first-wins for Claude/Codex formats
Properly implements last-wins for Gemini SSE by tracking lastGeminiUsage separately
Includes appropriate debug logging
Comments accurately explain the behavior

The only concern is the lack of test coverage for this critical billing path.

Automated review by Claude AI

…691) Gemini SSE streams return usageMetadata in every event, but only the final event contains complete token counts (candidatesTokenCount, thoughtsTokenCount). The existing first-wins strategy in applyUsageValue caused output tokens to be missed since early events only have promptTokenCount. This fix introduces last-wins strategy specifically for Gemini SSE usageMetadata while preserving first-wins for other formats (Claude, Codex) where usage is returned complete in a single event. Fixes: Gemini streaming responses showing 0 output tokens in billing

github-project-automation bot added this to Claude Code Hub Roadmap Jan 31, 2026

github-project-automation bot moved this to Backlog in Claude Code Hub Roadmap Jan 31, 2026

gemini-code-assist bot reviewed Jan 31, 2026

View reviewed changes

github-actions bot added bug Something isn't working area:Google Gemini area:provider labels Jan 31, 2026

greptile-apps bot reviewed Jan 31, 2026

View reviewed changes

coderabbitai bot approved these changes Jan 31, 2026

View reviewed changes

github-actions bot added the size/XS Extra Small PR (< 50 lines) label Jan 31, 2026

github-actions bot reviewed Jan 31, 2026

View reviewed changes

ding113 merged commit fee5e97 into ding113:dev Feb 2, 2026
13 of 15 checks passed

github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 2, 2026

github-actions bot mentioned this pull request Feb 3, 2026

release v0.5.3 #712

Merged

This was referenced Feb 10, 2026

fix(proxy): 修复请求卡死（AgentPool 驱逐阻塞） #759

Merged

fix(proxy): 加固 AgentPool 清理与透传 stats #762

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(billing): use last-wins for Gemini SSE usageMetadata extraction#691

fix(billing): use last-wins for Gemini SSE usageMetadata extraction#691
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-sse-usage-token-count

sususu98 commented Jan 31, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Uh oh!

coderabbitai bot commented Jan 31, 2026 •

edited

Loading

概览

变更

代码审查工作量估计

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 31, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 31, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sususu98 commented Jan 31, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Example

Testing

Manual Testing

Expected Behavior

Checklist

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概览

变更

代码审查工作量估计

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: XS

Issues Found

Medium Priority Issues (Should Fix)

Review Coverage

Additional Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sususu98 commented Jan 31, 2026 •

edited by github-actions bot

Loading

coderabbitai bot commented Jan 31, 2026 •

edited

Loading