Skip to content

fix(billing): use last-wins for Gemini SSE usageMetadata extraction#691

Merged
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-sse-usage-token-count
Feb 2, 2026
Merged

fix(billing): use last-wins for Gemini SSE usageMetadata extraction#691
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-sse-usage-token-count

Conversation

@sususu98
Copy link
Contributor

@sususu98 sususu98 commented Jan 31, 2026

Summary

Fix Gemini SSE streaming responses showing 0 output tokens in billing by implementing a last-wins strategy for usageMetadata extraction. Gemini SSE streams return usageMetadata in every event, but only the final event contains complete token counts (candidatesTokenCount, thoughtsTokenCount). The existing first-wins strategy caused output tokens to be missed since early events only contain promptTokenCount.

Related Issues:

Problem

Gemini SSE streaming responses include usageMetadata in every event chunk, but the token counts are incomplete until the final event:

First event (incomplete):

{
  "promptTokenCount": 349,
  "totalTokenCount": 349
  // candidatesTokenCount: missing
  // thoughtsTokenCount: missing
}

Final event (complete):

{
  "promptTokenCount": 349,
  "candidatesTokenCount": 2269,
  "thoughtsTokenCount": 346,
  "totalTokenCount": 2964
}

The existing first-wins extraction strategy (appropriate for Claude/Codex where usage appears once) captured only the first incomplete event, resulting in:

  • Input: 349 tokens ✅ (correct)
  • Output: 0 tokens ❌ (should be 2269 + 346 = 2615)

Solution

Introduce a last-wins strategy specifically for Gemini SSE usageMetadata while preserving first-wins for other formats:

  1. Track Gemini usage separately: Store lastGeminiUsage and lastGeminiUsageRecord instead of immediately applying values
  2. Continuous updates: Each SSE event with usageMetadata overwrites the previous value
  3. Final application: After SSE stream ends, use the last recorded value (which contains complete counts)
  4. Backward compatibility: Only apply Gemini last-wins when Claude SSE usage and standard applyUsageValue both failed to find usage

Changes

Core Changes:

  • src/app/v1/_lib/proxy/response-handler.ts (+38/-4):
    • Added lastGeminiUsage and lastGeminiUsageRecord tracking variables
    • Modified SSE event loop to update Gemini usage on each event instead of first-only
    • Applied last Gemini usage after stream completion if no other usage found
    • Preserved first-wins behavior for data.usage (Claude/Codex compatibility)

Example

Before fix (user report):

Event 1:  promptTokenCount: 349, totalTokenCount: 349 (used by first-wins)
Event 2:  promptTokenCount: 349, candidatesTokenCount: 200, thoughtsTokenCount: 50
...
Event N:  promptTokenCount: 349, candidatesTokenCount: 2269, thoughtsTokenCount: 346

→ Result: input=349, output=0 ❌

After fix:

Event 1:  promptTokenCount: 349 (stored, not applied)
Event 2:  promptTokenCount: 349, candidatesTokenCount: 200, thoughtsTokenCount: 50 (stored, overwrites)
...
Event N:  promptTokenCount: 349, candidatesTokenCount: 2269, thoughtsTokenCount: 346 (stored, overwrites)

→ Stream ends → Use last event data
→ Result: input=349, output=2615 ✅

Testing

Manual Testing

  1. Configure a Gemini provider (gemini-2.5-flash, gemini-2.5-pro)
  2. Send a streaming request (stream: true)
  3. Check usage logs/billing records
  4. Verify output tokens correctly reflect candidatesTokenCount + thoughtsTokenCount from final SSE event

Expected Behavior

  • ✅ Gemini SSE requests show correct output token counts
  • ✅ Claude SSE requests unchanged (use first-wins for message_start.usage)
  • ✅ Non-SSE requests unchanged (use data.usage directly)
  • ✅ Backward compatible with existing usage extraction logic

Checklist

  • Code follows project conventions
  • Self-review completed
  • Logic validated against real Gemini SSE response structure
  • Preserves existing behavior for non-Gemini providers

Description enhanced by Claude AI

Gemini SSE streams return usageMetadata in every event, but only the
final event contains complete token counts (candidatesTokenCount,
thoughtsTokenCount). The existing first-wins strategy in applyUsageValue
caused output tokens to be missed since early events only have
promptTokenCount.

This fix introduces last-wins strategy specifically for Gemini SSE
usageMetadata while preserving first-wins for other formats (Claude,
Codex) where usage is returned complete in a single event.

Fixes: Gemini streaming responses showing 0 output tokens in billing
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sususu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in billing for Gemini SSE streaming responses, where output tokens were incorrectly reported as zero. The core problem stemmed from Gemini providing complete token usage metadata only in the final event of a stream, while the system was using a 'first-wins' approach. The solution implements a 'last-wins' strategy specifically for Gemini SSE's usageMetadata to ensure that the full and accurate token counts from the last event are captured, thereby correcting billing discrepancies. This change is isolated to Gemini SSE processing, preserving the existing logic for other providers.

Highlights

  • Billing Accuracy Fix: Resolves an issue where Gemini Server-Sent Events (SSE) streaming responses incorrectly reported 0 output tokens for billing purposes.
  • Last-Wins Strategy for Gemini SSE: Introduces a 'last-wins' strategy specifically for extracting usageMetadata from Gemini SSE streams, as complete token counts are only available in the final event.
  • Preservation of Existing Logic: Maintains the 'first-wins' strategy for other providers like Claude and Codex, where usage information is complete in a single event.
  • New State Variables: Adds lastGeminiUsage and lastGeminiUsageRecord to store the most recent usageMetadata encountered during Gemini SSE processing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

概览

该变更修改了响应处理器逻辑,为SSE事件中的Gemini usageMetadata实现后赢语义。使用延迟累积而非立即应用的方式存储在lastGeminiUsage中,确保最终用法指标来自最近有效的Gemini使用数据。

变更

群组/文件(s) 总结
Gemini SSE事件处理
src/app/v1/_lib/proxy/response-handler.ts
引入lastGeminiUsage和lastGeminiUsageRecord变量实现后赢语义。调整message_start和message_delta事件解析流程,推迟Gemini usageMetadata的应用。更新最终用法解析路径以优先使用最新的Gemini指标。在流式和非流式处理路径中应用相同变更。

代码审查工作量估计

🎯 4 (复杂) | ⏱️ ~45 分钟

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确总结了主要变更:为Gemini SSE usageMetadata提取采用last-wins策略,与changeset中的核心修改完全一致。
Description check ✅ Passed PR描述详细说明了问题、解决方案和具体示例,与代码改动相关联,描述了Gemini SSE流式响应中usageMetadata的last-wins策略实现。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a billing issue with Gemini SSE streaming by implementing a "last-wins" strategy for usageMetadata extraction. This ensures that the final, complete token counts are used, fixing the problem of zero output tokens being reported. The changes are logical and well-contained. I have one suggestion to improve code maintainability by refactoring a small piece of duplicated logic.

Comment on lines +1647 to 1671
// Gemini usageMetadata - 改为 last-wins 策略
// 跳过 applyUsageValue(它是 first-wins),直接更新
if (data.usageMetadata && typeof data.usageMetadata === "object") {
const extracted = extractUsageMetrics(data.usageMetadata);
if (extracted) {
// 持续更新,最后一个有效值会覆盖之前的
lastGeminiUsage = extracted;
lastGeminiUsageRecord = data.usageMetadata as Record<string, unknown>;
}
}

// Handle response wrapping in SSE
if (!usageMetrics && data.response && typeof data.response === "object") {
const responseObj = data.response as Record<string, unknown>;
applyUsageValue(responseObj.usage, `sse.${event.event}.response.usage`);
applyUsageValue(responseObj.usageMetadata, `sse.${event.event}.response.usageMetadata`);

// response.usageMetadata 也使用 last-wins 策略
if (responseObj.usageMetadata && typeof responseObj.usageMetadata === "object") {
const extracted = extractUsageMetrics(responseObj.usageMetadata);
if (extracted) {
lastGeminiUsage = extracted;
lastGeminiUsageRecord = responseObj.usageMetadata as Record<string, unknown>;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for extracting and updating lastGeminiUsage from usageMetadata is duplicated for data.usageMetadata and responseObj.usageMetadata. This can be refactored to improve code clarity and maintainability.

Specifically, the following pattern is repeated:

if (metadataSource && typeof metadataSource === 'object') {
  const extracted = extractUsageMetrics(metadataSource);
  if (extracted) {
    lastGeminiUsage = extracted;
    lastGeminiUsageRecord = metadataSource as Record<string, unknown>;
  }
}

Consider extracting this into a helper function within parseUsageFromResponseText to avoid repetition. For example:

const applyLastGeminiUsage = (value: unknown) => {
  if (value && typeof value === 'object') {
    const extracted = extractUsageMetrics(value);
    if (extracted) {
      lastGeminiUsage = extracted;
      lastGeminiUsageRecord = value as Record<string, unknown>;
    }
  }
};

This would make the code in the loop more concise and easier to maintain.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

// 注意:Gemini SSE 流中,usageMetadata 在每个事件中都可能存在,
// 但只有最后一个事件包含完整的 token 计数(candidatesTokenCount、thoughtsTokenCount 等)
// 因此需要持续更新,使用最后一个有效值
if (!messageStartUsage && !messageDeltaUsage) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition !messageStartUsage && !messageDeltaUsage may prevent Gemini last-wins logic from running if a provider returns both Claude-style events AND Gemini-style usageMetadata in the same response

if a malformed response contains both message_start events and usageMetadata fields, the Gemini SSE handling (lines 1647-1671) will be skipped, causing Gemini usage data to be ignored

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1643:1643

Comment:
condition `!messageStartUsage && !messageDeltaUsage` may prevent Gemini last-wins logic from running if a provider returns both Claude-style events AND Gemini-style `usageMetadata` in the same response

if a malformed response contains both `message_start` events and `usageMetadata` fields, the Gemini SSE handling (lines 1647-1671) will be skipped, causing Gemini usage data to be ignored

How can I resolve this? If you propose a fix, please make it concise.

@github-actions github-actions bot added the size/XS Extra Small PR (< 50 lines) label Jan 31, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces a targeted fix for a billing-critical bug where Gemini SSE streaming responses were incorrectly showing 0 output tokens. The implementation correctly switches from a first-wins to a last-wins strategy for Gemini's usageMetadata extraction, as Gemini streams return incomplete token counts in early events and only provide complete counts in the final event.

PR Size: XS

  • Lines changed: 42 (38 additions + 4 deletions)
  • Files changed: 1

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 1 0
Simplification 0 0 0 0

Medium Priority Issues (Should Fix)

1. [TEST-MISSING-CRITICAL] Missing test coverage for Gemini SSE streaming usage extraction

Location: src/app/v1/_lib/proxy/response-handler.ts:1563-1704

Why this is a problem: This PR fixes a billing-critical bug where Gemini SSE streams were showing 0 output tokens. However, there are no unit tests covering the new "last-wins" strategy for Gemini SSE usageMetadata extraction. The existing tests in tests/unit/proxy/extract-usage-metrics.test.ts cover:

  • Gemini format (lines 311-508) - but only for single JSON responses
  • SSE streaming (lines 545-605) - but only for Claude format

Per CLAUDE.md: "All new features must have unit test coverage of at least 80%"

Without tests, there's a risk of regression if future changes inadvertently break this fix.

Suggested fix:

// Add to tests/unit/proxy/extract-usage-metrics.test.ts
describe("Gemini SSE streaming usage extraction", () => {
  it("should use last event's usageMetadata (last-wins strategy)", () => {
    // Simulate Gemini SSE stream where early events have incomplete usage
    const sseResponse = [
      'data: {"usageMetadata":{"promptTokenCount":349,"totalTokenCount":349}}',
      '',
      'data: {"usageMetadata":{"promptTokenCount":349,"candidatesTokenCount":500,"totalTokenCount":849}}',
      '',
      'data: {"usageMetadata":{"promptTokenCount":349,"candidatesTokenCount":2269,"thoughtsTokenCount":346,"totalTokenCount":2964}}',
      '',
    ].join("\n");

    const result = parseUsageFromResponseText(sseResponse, "gemini");

    expect(result.usageMetrics).not.toBeNull();
    expect(result.usageMetrics?.input_tokens).toBe(349);
    // output_tokens should be from last event: 2269 + 346 = 2615
    expect(result.usageMetrics?.output_tokens).toBe(2615);
  });

  it("should handle response.usageMetadata wrapping in SSE", () => {
    const sseResponse = [
      'data: {"response":{"usageMetadata":{"promptTokenCount":100,"totalTokenCount":100}}}',
      '',
      'data: {"response":{"usageMetadata":{"promptTokenCount":100,"candidatesTokenCount":200,"totalTokenCount":300}}}',
      '',
    ].join("\n");

    const result = parseUsageFromResponseText(sseResponse, "gemini");

    expect(result.usageMetrics).not.toBeNull();
    expect(result.usageMetrics?.input_tokens).toBe(100);
    expect(result.usageMetrics?.output_tokens).toBe(200);
  });
});

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Missing Gemini SSE streaming tests
  • Code clarity - Good

Additional Notes

The implementation is sound:

  • Correctly preserves first-wins for Claude/Codex formats
  • Properly implements last-wins for Gemini SSE by tracking lastGeminiUsage separately
  • Includes appropriate debug logging
  • Comments accurately explain the behavior

The only concern is the lack of test coverage for this critical billing path.


Automated review by Claude AI

@ding113 ding113 merged commit fee5e97 into ding113:dev Feb 2, 2026
13 of 15 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 2, 2026
github-actions bot pushed a commit that referenced this pull request Feb 2, 2026
…691)

Gemini SSE streams return usageMetadata in every event, but only the
final event contains complete token counts (candidatesTokenCount,
thoughtsTokenCount). The existing first-wins strategy in applyUsageValue
caused output tokens to be missed since early events only have
promptTokenCount.

This fix introduces last-wins strategy specifically for Gemini SSE
usageMetadata while preserving first-wins for other formats (Claude,
Codex) where usage is returned complete in a single event.

Fixes: Gemini streaming responses showing 0 output tokens in billing
@github-actions github-actions bot mentioned this pull request Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Google Gemini area:provider bug Something isn't working size/XS Extra Small PR (< 50 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants