Skip to content

fix(billing): 修复 Gemini 缓存 token 重复计费问题#338

Merged
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-cached-token-billing
Dec 13, 2025
Merged

fix(billing): 修复 Gemini 缓存 token 重复计费问题#338
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-cached-token-billing

Conversation

@sususu98
Copy link
Contributor

@sususu98 sususu98 commented Dec 13, 2025

问题描述

Gemini API 的 promptTokenCount 包含了 cachedContentTokenCount,但原代码直接使用 promptTokenCount 作为 input_tokens,导致缓存命中的 token 被重复计费:

  • input 成本:按全部 prompt tokens 计费(含缓存部分)
  • cache 成本:缓存 tokens 再次按 cache 价格计费

Related Issues:

修复方案

在解析 Gemini usage 时,直接从 promptTokenCount 中减去 cachedContentTokenCount

// 修复前
result.input_tokens = usage.promptTokenCount;

// 修复后
const cachedTokens = typeof usage.cachedContentTokenCount === "number" 
  ? usage.cachedContentTokenCount : 0;
result.input_tokens = Math.max(usage.promptTokenCount - cachedTokens, 0);

数据验证

根据 Gemini 官方 API 返回示例:

promptTokenCount = 696219
cachedContentTokenCount = 696190
candidatesTokenCount = 214
totalTokenCount = 696433

数学验证:696219 + 214 = 696433

证明 promptTokenCount 确实包含 cachedContentTokenCount,修复后:

  • input_tokens = 696219 - 696190 = 29
  • cache_read_input_tokens = 696190

Changes

Core Changes

  • src/app/v1/_lib/proxy/response-handler.ts: 在 Gemini usage 解析中减去 cachedContentTokenCount 避免重复计费

测试

  • TypeScript 类型检查通过
  • 代码逻辑验证:使用 Math.max(..., 0) 确保结果非负

Checklist

  • Code follows project conventions
  • Self-review completed
  • Tests pass locally

Description enhanced by Claude AI

Gemini API 的 promptTokenCount 包含 cachedContentTokenCount,
计算 input 成本时需要减去缓存部分以避免重复计费。

修复前:input = promptTokenCount × input_price(含缓存)
       cache = cachedContentTokenCount × cache_price
       结果:缓存 token 被计费两次

修复后:input = (promptTokenCount - cachedContentTokenCount) × input_price
       cache = cachedContentTokenCount × cache_price
       结果:正确计费
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sususu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在纠正 Gemini API 使用量统计中的一个计费错误。此前,系统错误地将缓存的 token 计入 input_tokens,导致用户为同一部分内容支付两次费用。通过调整 input_tokens 的计算方式,确保只有实际未缓存的 prompt 部分才被计为输入,从而避免了重复计费,提高了计费的准确性。

Highlights

  • 修复 Gemini 重复计费: 解决了 Gemini API 在计算 input_tokens 时,因 promptTokenCount 包含 cachedContentTokenCount 导致缓存部分被重复计费的问题。
  • 调整 Token 计算逻辑: 在 extractUsageMetrics 函数中,从 promptTokenCount 中减去 cachedContentTokenCount 来获取准确的 input_tokens
  • 添加计费说明注释: 在代码中增加了详细注释,解释了 Gemini 的计费公式,明确了 inputcache 部分的计算方式。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working size/XS Extra Small PR (< 50 lines) labels Dec 13, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No significant issues identified in this PR.

PR Size: XS

  • Lines changed: 7
  • Files changed: 1

Analysis

The fix correctly addresses the double-billing issue for Gemini cached tokens. The PR description accurately explains the problem:

  1. Problem: Gemini's promptTokenCount includes cachedContentTokenCount, causing cached tokens to be billed twice:

    • Once as input tokens (at full input price)
    • Again as cache read tokens (at cache price)
  2. Solution: The fix subtracts cachedContentTokenCount from promptTokenCount before assigning to input_tokens:

    result.input_tokens = Math.max(usage.promptTokenCount - cachedTokens, 0);
  3. Mathematical verification (from PR description):

    • promptTokenCount = 696219
    • cachedContentTokenCount = 696190
    • Fixed input_tokens = 696219 - 696190 = 29 (only non-cached tokens)
    • cache_read_input_tokens = 696190 (cached tokens, billed separately)

Verified

  • ✅ The fix is in the correct location (extractUsageMetrics function which feeds into calculateRequestCost)
  • ✅ The Math.max(..., 0) guard prevents negative values if data is inconsistent
  • ✅ Type checking with typeof usage.cachedContentTokenCount === "number" handles undefined/null gracefully
  • ✅ The comment explains the billing formula clearly
  • ✅ The GeminiAdapter transformation (used for client responses, not billing) is separate and unaffected

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - No unit tests for this function (pre-existing gap)
  • Code clarity - Good

Automated review by Claude AI

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此拉取请求有效解决了 Gemini API token 计费中的一个关键问题。问题描述清晰,修复方案也得到了正确实施,通过从 promptTokenCount 中减去 cachedContentTokenCount 来避免重复计费。代码中新增的注释也很好地解释了计费逻辑,提高了代码的可读性和可维护性。这是一个必要且正确的修复。

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No significant issues identified in this PR.

PR Size: XS

  • Lines changed: 7
  • Files changed: 1

Analysis

The fix correctly addresses the double-billing issue for Gemini cached tokens. The PR description accurately explains the problem:

  1. Problem: Gemini's promptTokenCount includes cachedContentTokenCount, causing cached tokens to be billed twice:

    • Once as input tokens (at full input price)
    • Again as cache read tokens (at cache price)
  2. Solution: The fix subtracts cachedContentTokenCount from promptTokenCount before assigning to input_tokens:

    result.input_tokens = Math.max(usage.promptTokenCount - cachedTokens, 0);
  3. Mathematical verification (from PR description):

    • promptTokenCount = 696219
    • cachedContentTokenCount = 696190
    • Fixed input_tokens = 696219 - 696190 = 29 (only non-cached tokens)
    • cache_read_input_tokens = 696190 (cached tokens, billed separately)

Verified

  • The fix is in the correct location (extractUsageMetrics function which feeds into calculateRequestCost)
  • The Math.max(..., 0) guard prevents negative values if data is inconsistent
  • Type checking with typeof usage.cachedContentTokenCount === "number" handles undefined/null gracefully
  • The comment explains the billing formula clearly
  • The GeminiAdapter transformation (used for client responses, not billing) is separate and unaffected

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - No unit tests for this function (pre-existing gap)
  • Code clarity - Good

Automated review by Claude AI

@ding113 ding113 merged commit 7cd9d76 into ding113:dev Dec 13, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size/XS Extra Small PR (< 50 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants

Comments