fix(billing): 修复 Gemini 缓存 token 重复计费问题#338
Conversation
Gemini API 的 promptTokenCount 包含 cachedContentTokenCount,
计算 input 成本时需要减去缓存部分以避免重复计费。
修复前:input = promptTokenCount × input_price(含缓存)
cache = cachedContentTokenCount × cache_price
结果:缓存 token 被计费两次
修复后:input = (promptTokenCount - cachedContentTokenCount) × input_price
cache = cachedContentTokenCount × cache_price
结果:正确计费
Summary of ChangesHello @sususu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在纠正 Gemini API 使用量统计中的一个计费错误。此前,系统错误地将缓存的 token 计入 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR.
PR Size: XS
- Lines changed: 7
- Files changed: 1
Analysis
The fix correctly addresses the double-billing issue for Gemini cached tokens. The PR description accurately explains the problem:
-
Problem: Gemini's
promptTokenCountincludescachedContentTokenCount, causing cached tokens to be billed twice:- Once as input tokens (at full input price)
- Again as cache read tokens (at cache price)
-
Solution: The fix subtracts
cachedContentTokenCountfrompromptTokenCountbefore assigning toinput_tokens:result.input_tokens = Math.max(usage.promptTokenCount - cachedTokens, 0);
-
Mathematical verification (from PR description):
promptTokenCount = 696219cachedContentTokenCount = 696190- Fixed
input_tokens = 696219 - 696190 = 29(only non-cached tokens) cache_read_input_tokens = 696190(cached tokens, billed separately)
Verified
- ✅ The fix is in the correct location (
extractUsageMetricsfunction which feeds intocalculateRequestCost) - ✅ The
Math.max(..., 0)guard prevents negative values if data is inconsistent - ✅ Type checking with
typeof usage.cachedContentTokenCount === "number"handles undefined/null gracefully - ✅ The comment explains the billing formula clearly
- ✅ The GeminiAdapter transformation (used for client responses, not billing) is separate and unaffected
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - No unit tests for this function (pre-existing gap)
- Code clarity - Good
Automated review by Claude AI
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR.
PR Size: XS
- Lines changed: 7
- Files changed: 1
Analysis
The fix correctly addresses the double-billing issue for Gemini cached tokens. The PR description accurately explains the problem:
-
Problem: Gemini's
promptTokenCountincludescachedContentTokenCount, causing cached tokens to be billed twice:- Once as input tokens (at full input price)
- Again as cache read tokens (at cache price)
-
Solution: The fix subtracts
cachedContentTokenCountfrompromptTokenCountbefore assigning toinput_tokens:result.input_tokens = Math.max(usage.promptTokenCount - cachedTokens, 0);
-
Mathematical verification (from PR description):
promptTokenCount = 696219cachedContentTokenCount = 696190- Fixed
input_tokens = 696219 - 696190 = 29(only non-cached tokens) cache_read_input_tokens = 696190(cached tokens, billed separately)
Verified
- The fix is in the correct location (
extractUsageMetricsfunction which feeds intocalculateRequestCost) - The
Math.max(..., 0)guard prevents negative values if data is inconsistent - Type checking with
typeof usage.cachedContentTokenCount === "number"handles undefined/null gracefully - The comment explains the billing formula clearly
- The GeminiAdapter transformation (used for client responses, not billing) is separate and unaffected
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - No unit tests for this function (pre-existing gap)
- Code clarity - Good
Automated review by Claude AI
问题描述
Gemini API 的
promptTokenCount包含了cachedContentTokenCount,但原代码直接使用promptTokenCount作为input_tokens,导致缓存命中的 token 被重复计费:Related Issues:
修复方案
在解析 Gemini usage 时,直接从
promptTokenCount中减去cachedContentTokenCount:数据验证
根据 Gemini 官方 API 返回示例:
数学验证:
696219 + 214 = 696433✅证明
promptTokenCount确实包含cachedContentTokenCount,修复后:input_tokens = 696219 - 696190 = 29cache_read_input_tokens = 696190Changes
Core Changes
src/app/v1/_lib/proxy/response-handler.ts: 在 Gemini usage 解析中减去cachedContentTokenCount避免重复计费测试
Math.max(..., 0)确保结果非负Checklist
Description enhanced by Claude AI