Fix #312: Extract output_tokens from message_delta for Anthropic SSE streams by ding113 · Pull Request #313 · ding113/claude-code-hub

ding113 · 2025-12-10T10:58:28Z

Summary

Fixed incorrect output token counting for Anthropic-type providers in SSE streaming responses
Now properly extracts output_tokens from message_delta event (at stream end) instead of message_start (at stream beginning)
Maintains extraction of input tokens and cache fields (including 5m/1h differentiated billing) from message_start

Problem

Fixes #312

The v0.3.27 update introduced differentiated cache billing (5m/1h), which changed the usage extraction logic to use message_start. However, message_start appears at the beginning of the stream when output_tokens is typically 1 (or very low), before the actual response content is generated.

Example message_start (incorrect source for output_tokens):

{"type":"message_start","message":{"usage":{"input_tokens":8,"output_tokens":1,...}}}

Example message_delta (correct source for final output_tokens):

{"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"output_tokens":356}}

Solution

Modified parseUsageFromResponseText in src/app/v1/_lib/proxy/response-handler.ts to:

Extract input tokens and cache fields from message_start event (required for 5m/1h cache billing)
Extract final output_tokens from message_delta event (appears at stream end with accurate count)
Merge both metrics to provide complete and accurate usage data

This approach preserves the 5m/1h cache differentiation feature while fixing the output token counting bug.

Changes

src/app/v1/_lib/proxy/response-handler.ts: Rewrote SSE parsing logic for Claude format to handle message_start and message_delta separately, then merge the results

Testing

TypeScript compilation passes
Biome lint passes
Manual testing with real Anthropic SSE streams (recommended)

Created by Claude AI in response to @claude mention

The previous implementation extracted output_tokens from message_start event which appears at the beginning of the SSE stream, resulting in incorrect (usually 1) output token counts since the main content hadn't been generated yet. This fix: - Extracts input tokens and cache fields (5m/1h) from message_start - Extracts final output_tokens from message_delta (at stream end) - Merges both to get accurate usage metrics Closes #312 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

gemini-code-assist · 2025-12-10T10:58:43Z

Summary of Changes

Hello @ding113, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where Anthropic SSE streaming responses were reporting incorrect output token counts due to premature extraction from the message_start event. The change refines the token usage parsing by separating the extraction of input tokens and cache information from the message_start event, and the final, accurate output token count from the message_delta event, ensuring that all usage metrics are correctly captured and merged without disrupting existing cache billing functionalities.

Highlights

Accurate Anthropic Output Token Counting: Corrected the logic for counting output tokens in Anthropic SSE streaming responses by extracting the final output_tokens value from the message_delta event, which occurs at the end of the stream and provides the accurate count.
Preservation of Input Token and Cache Billing Logic: Ensured that the extraction of input tokens and cache-related fields (including the 5m/1h differentiated billing) continues to correctly occur from the message_start event at the beginning of the stream.
Unified Usage Metrics: Implemented a merging mechanism to combine the input token and cache data from message_start with the accurate output token count from message_delta, providing complete and correct usage metrics for Anthropic SSE streams.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly fixes an issue with output token counting for Anthropic SSE streams by extracting usage data from message_delta events. The implementation is sound. I've added one suggestion to refactor the logic to make it more robust and maintainable. By explicitly checking the providerType instead of implicitly detecting the provider based on SSE event names, the code becomes easier to understand and less prone to future bugs if other providers adopt similar event names.

gemini-code-assist · 2025-12-10T11:00:28Z

src/app/v1/_lib/proxy/response-handler.ts

    const events = parseSSEData(responseText);
-    for (const event of events) {
-      if (usageMetrics) {
-        break;
-      }

+    // Claude SSE 特殊处理：
+    // - message_start 包含 input tokens 和缓存创建字段（5m/1h 区分计费）
+    // - message_delta 包含最终的 output_tokens
+    // 需要分别提取并合并
+    let messageStartUsage: UsageMetrics | null = null;
+    let messageDeltaOutputTokens: number | null = null;
+
+    for (const event of events) {
      if (typeof event.data !== "object" || !event.data) {
        continue;
      }

      const data = event.data as Record<string, unknown>;

-      // Claude message_start format: data.message.usage (preferred)
-      if (data.message && typeof data.message === "object") {
+      // Claude message_start format: data.message.usage
+      // 提取 input tokens 和缓存字段
+      if (event.event === "message_start" && data.message && typeof data.message === "object") {
        const messageObj = data.message as Record<string, unknown>;
-        applyUsageValue(messageObj.usage, `sse.${event.event}.message.usage`);
+        if (messageObj.usage && typeof messageObj.usage === "object") {
+          const extracted = extractUsageMetrics(messageObj.usage);
+          if (extracted) {
+            messageStartUsage = extracted;
+            logger.debug("[ResponseHandler] Extracted usage from message_start", {
+              source: "sse.message_start.message.usage",
+              usage: extracted,
+            });
+          }
+        }
+      }
+
+      // Claude message_delta format: data.usage.output_tokens
+      // 提取最终的 output_tokens（在流结束时）
+      if (event.event === "message_delta" && data.usage && typeof data.usage === "object") {
+        const deltaUsage = data.usage as Record<string, unknown>;
+        if (typeof deltaUsage.output_tokens === "number") {
+          messageDeltaOutputTokens = deltaUsage.output_tokens;
+          logger.debug("[ResponseHandler] Extracted output_tokens from message_delta", {
+            source: "sse.message_delta.usage.output_tokens",
+            outputTokens: messageDeltaOutputTokens,
+          });
+        }
      }

-      // Fallback: Standard usage fields (data.usage)
-      applyUsageValue(data.usage, `sse.${event.event}.usage`);
+      // 非 Claude 格式的 SSE 处理（Gemini 等）
+      if (!messageStartUsage && !messageDeltaOutputTokens) {
+        // Standard usage fields (data.usage)
+        applyUsageValue(data.usage, `sse.${event.event}.usage`);
+
+        // Gemini usageMetadata
+        applyUsageValue(data.usageMetadata, `sse.${event.event}.usageMetadata`);

-      // Gemini usageMetadata
-      applyUsageValue(data.usageMetadata, `sse.${event.event}.usageMetadata`);
+        // Handle response wrapping in SSE
+        if (!usageMetrics && data.response && typeof data.response === "object") {
+          const responseObj = data.response as Record<string, unknown>;
+          applyUsageValue(responseObj.usage, `sse.${event.event}.response.usage`);
+          applyUsageValue(responseObj.usageMetadata, `sse.${event.event}.response.usageMetadata`);
+        }
+      }
+    }

-      // Handle response wrapping in SSE
-      if (!usageMetrics && data.response && typeof data.response === "object") {
-        const responseObj = data.response as Record<string, unknown>;
-        applyUsageValue(responseObj.usage, `sse.${event.event}.response.usage`);
-        applyUsageValue(responseObj.usageMetadata, `sse.${event.event}.response.usageMetadata`);
+    // 合并 Claude SSE 的 message_start 和 message_delta 数据
+    if (messageStartUsage) {
+      // 使用 message_delta 中的 output_tokens 覆盖 message_start 中的值
+      if (messageDeltaOutputTokens !== null) {
+        messageStartUsage.output_tokens = messageDeltaOutputTokens;
+        logger.debug(
+          "[ResponseHandler] Merged output_tokens from message_delta into message_start usage",
+          {
+            finalOutputTokens: messageDeltaOutputTokens,
+          }
+        );
      }
+      usageMetrics = adjustUsageForProviderType(messageStartUsage, providerType);
+      usageRecord = messageStartUsage as unknown as Record<string, unknown>;
+      logger.debug("[ResponseHandler] Final merged usage from Claude SSE", {
+        providerType,
+        usage: usageMetrics,
+      });
    }


The current implementation correctly handles the Anthropic SSE stream parsing. However, it implicitly detects a Claude stream by checking for message_start or message_delta events. This could be brittle if other providers adopt similar event names in the future.

A more robust and maintainable approach would be to explicitly check the providerType to distinguish between Claude/Anthropic streams and other providers like Gemini. This makes the separation of parsing logic clearer and safer.

I suggest refactoring this block to have a clear if/else based on providerType. This also allows us to restore the more efficient break statement for non-Claude providers.

const events = parseSSEData(responseText); if (providerType === "claude" || providerType === "claude-auth") { // Claude SSE 特殊处理： // - message_start 包含 input tokens 和缓存创建字段（5m/1h 区分计费） // - message_delta 包含最终的 output_tokens // 需要分别提取并合并 let messageStartUsage: UsageMetrics | null = null; let messageDeltaOutputTokens: number | null = null; for (const event of events) { if (typeof event.data !== "object" || !event.data) { continue; } const data = event.data as Record<string, unknown>; // Claude message_start format: data.message.usage // 提取 input tokens 和缓存字段 if (event.event === "message_start" && data.message && typeof data.message === "object") { const messageObj = data.message as Record<string, unknown>; if (messageObj.usage && typeof messageObj.usage === "object") { const extracted = extractUsageMetrics(messageObj.usage); if (extracted) { messageStartUsage = extracted; logger.debug("[ResponseHandler] Extracted usage from message_start", { source: "sse.message_start.message.usage", usage: extracted, }); } } } // Claude message_delta format: data.usage.output_tokens // 提取最终的 output_tokens（在流结束时） if (event.event === "message_delta" && data.usage && typeof data.usage === "object") { const deltaUsage = data.usage as Record<string, unknown>; if (typeof deltaUsage.output_tokens === "number") { messageDeltaOutputTokens = deltaUsage.output_tokens; logger.debug("[ResponseHandler] Extracted output_tokens from message_delta", { source: "sse.message_delta.usage.output_tokens", outputTokens: messageDeltaOutputTokens, }); } } } // 合并 Claude SSE 的 message_start 和 message_delta 数据 if (messageStartUsage) { // 使用 message_delta 中的 output_tokens 覆盖 message_start 中的值 if (messageDeltaOutputTokens !== null) { messageStartUsage.output_tokens = messageDeltaOutputTokens; logger.debug( "[ResponseHandler] Merged output_tokens from message_delta into message_start usage", { finalOutputTokens: messageDeltaOutputTokens, } ); } usageMetrics = adjustUsageForProviderType(messageStartUsage, providerType); usageRecord = messageStartUsage as unknown as Record<string, unknown>; logger.debug("[ResponseHandler] Final merged usage from Claude SSE", { providerType, usage: usageMetrics, }); } } else { // 非 Claude 格式的 SSE 处理（Gemini 等） for (const event of events) { if (usageMetrics) { break; } if (typeof event.data !== "object" || !event.data) { continue; } const data = event.data as Record<string, unknown>; // Standard usage fields (data.usage) applyUsageValue(data.usage, `sse.${event.event}.usage`); // Gemini usageMetadata applyUsageValue(data.usageMetadata, `sse.${event.event}.usageMetadata`); // Handle response wrapping in SSE if (!usageMetrics && data.response && typeof data.response === "object") { const responseObj = data.response as Record<string, unknown>; applyUsageValue(responseObj.usage, `sse.${event.event}.response.usage`); applyUsageValue(responseObj.usageMetadata, `sse.${event.event}.response.usageMetadata`); } } }

github-actions

Code Review Summary

No significant issues identified in this PR. The implementation correctly addresses the bug where output_tokens was incorrectly extracted from message_start (which contains the initial/low value) instead of message_delta (which contains the final accurate count).

PR Size: XS

Lines changed: 82 (66 additions, 16 deletions)
Files changed: 1

Analysis Notes

The refactored parseUsageFromResponseText function now:

✅ Correctly separates Claude SSE handling from other formats (Gemini, etc.)
✅ Extracts input_tokens and cache fields from message_start event
✅ Extracts final output_tokens from message_delta event
✅ Merges both metrics appropriately
✅ Maintains backward compatibility with non-Claude SSE formats

The logic gate if (!messageStartUsage && !messageDeltaOutputTokens) correctly ensures non-Claude SSE processing only occurs when no Claude-specific events are detected, preventing format conflicts.

Review Coverage

Logic and correctness - Clean
Security (OWASP Top 10) - Clean
Error handling - Clean (appropriate debug logging added)
Type safety - Clean
Documentation accuracy - Clean (comments match implementation)
Test coverage - No automated tests for this function (pre-existing gap, not introduced by this PR)
Code clarity - Good (well-named variables, clear separation of concerns)

Recommendation

✅ Approve - The implementation is sound. Manual testing with real Anthropic SSE streams is recommended before deployment as noted in the PR description.

Automated review by Claude AI

github-actions

Code Review Summary

No significant issues identified in this PR. The implementation correctly addresses the bug where output_tokens was incorrectly extracted from message_start (which contains the initial/low value) instead of message_delta (which contains the final accurate count).

PR Size: XS

Lines changed: 82 (66 additions, 16 deletions)
Files changed: 1

Analysis Notes

The refactored parseUsageFromResponseText function now:

Correctly separates Claude SSE handling from other formats (Gemini, etc.)
Extracts input_tokens and cache fields from message_start event
Extracts final output_tokens from message_delta event
Merges both metrics appropriately
Maintains backward compatibility with non-Claude SSE formats

The logic gate if (!messageStartUsage && !messageDeltaOutputTokens) correctly ensures non-Claude SSE processing only occurs when no Claude-specific events are detected, preventing format conflicts.

Review Coverage

Logic and correctness - Clean
Security (OWASP Top 10) - Clean
Error handling - Clean (appropriate debug logging added)
Type safety - Clean
Documentation accuracy - Clean (comments match implementation)
Test coverage - No automated tests for this function (pre-existing gap, not introduced by this PR)
Code clarity - Good (well-named variables, clear separation of concerns)

Recommendation

Approve - The implementation is sound. Manual testing with real Anthropic SSE streams is recommended before deployment as noted in the PR description.

Automated review by Claude AI

github-project-automation bot added this to Claude Code Hub Roadmap Dec 10, 2025

github-project-automation bot moved this to Backlog in Claude Code Hub Roadmap Dec 10, 2025

ding113 mentioned this pull request Dec 10, 2025

bug: Anthropic 类型供应商输出 token 计数错误 #312

Closed

ding113 merged commit ae29020 into dev Dec 10, 2025
6 checks passed

github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Dec 10, 2025

github-actions bot added bug Something isn't working size/XS Extra Small PR (< 50 lines) labels Dec 10, 2025

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

github-actions bot reviewed Dec 10, 2025

View reviewed changes

github-actions bot mentioned this pull request Dec 10, 2025

fix #314

Merged

9 tasks

ding113 deleted the fix/issue-312-anthropic-output-tokens branch December 11, 2025 10:35

gemini-code-assist bot mentioned this pull request Dec 12, 2025

fix: 支持 Gemini thoughtsTokenCount 计费 #326

Closed

6 tasks

github-actions bot mentioned this pull request Dec 12, 2025

Fix/gemini thoughts token support #327

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Fix #312: Extract output_tokens from message_delta for Anthropic SSE streams#313

Fix #312: Extract output_tokens from message_delta for Anthropic SSE streams#313
ding113 merged 1 commit intodevfrom
fix/issue-312-anthropic-output-tokens

ding113 commented Dec 10, 2025

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 10, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

ding113 commented Dec 10, 2025

Summary

Problem

Solution

Changes

Testing

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: XS

Analysis Notes

Review Coverage

Recommendation

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: XS

Analysis Notes

Review Coverage

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant