fix(billing): 修复 Gemini 图片生成模型的 IMAGE modality token 计费问题#664
Conversation
问题背景: - gemini-3-pro-image-preview 等图片生成模型返回的 usage 中包含 candidatesTokensDetails - 该字段按 modality 细分 token (IMAGE/TEXT) - IMAGE modality token 价格为 $0.00012/token,是普通 TEXT token 的 10 倍 - 原系统未解析此字段,导致 IMAGE token 按 TEXT 价格计费,计费偏低约 7.6 倍 类型扩展 (src/types/model-price.ts): - 新增 output_cost_per_image_token: 输出图片 token 单价 (按 token 计费) - 新增 input_cost_per_image_token: 输入图片 token 单价 (按 token 计费) - 保留 input_cost_per_image: 输入图片固定价格 (按张计费,$0.0011/张) - 保留 output_cost_per_image: 输出图片固定价格 (按张计费) Usage 提取逻辑 (src/app/v1/_lib/proxy/response-handler.ts): - 解析 candidatesTokensDetails 提取 output_image_tokens 和 output_tokens (TEXT) - 解析 promptTokensDetails 提取 input_image_tokens 和 input_tokens (TEXT) - 使用 toUpperCase() 进行大小写不敏感匹配 (IMAGE/image/Image) - 添加 hasValidToken 守卫,仅在解析到有效 token 时覆盖原始值 - 修复 promptTokensDetails 解析不完整导致 input IMAGE tokens 被重复计费的问题 - 计算 candidatesTokenCount 与 details 总和的差值作为未分类 TEXT tokens (这些是图片生成的内部开销,按 TEXT 价格计费) 计费逻辑 (src/lib/utils/cost-calculation.ts): - output_image_tokens 优先使用 output_cost_per_image_token 计费 - input_image_tokens 优先使用 input_cost_per_image_token 计费 - 若未配置 image token 价格,回退到普通 token 价格 (向后兼容) - 倍率 (multiplier) 同时作用于 image token 费用 测试覆盖: - 新增 cost-calculation-image-tokens.test.ts (10 个测试) - 扩展 extract-usage-metrics.test.ts (12 个 Gemini image 测试) - 覆盖场景: 纯 IMAGE、IMAGE+TEXT 混合、无效数据、大小写变体、向后兼容、 混合输入输出、candidatesTokenCount 差值计算 计费示例 (完整图片生成请求): - promptTokenCount=326, candidatesTokenCount=2340, thoughtsTokenCount=337 - candidatesTokensDetails: IMAGE=2000 (差值 340 为未分类 TEXT) - 输入 TEXT: 326 × $0.000002 = $0.000652 - 输出 TEXT: (340+337) × $0.000012 = $0.008124 - 输出 IMAGE: 2000 × $0.00012 = $0.240000 - 总计: $0.248776 (修复前 $0.244696,少收 $0.00408) Fixes ding113#663
📝 WalkthroughWalkthrough此 PR 通过向 UsageMetrics 添加图像令牌字段、在 Gemini/OpenAI 响应解析中提取模态特定令牌计数、更新定价模型以支持图像特定成本,并添加全面的测试覆盖来扩展多模态令牌跟踪。 Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @sususu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在解决 Gemini 图片生成模型在计费时未能正确识别和应用图片模态 token 价格的问题。通过引入专门的图片 token 计费字段和更新解析及计算逻辑,确保系统能够准确地对图片输入和输出的 token 进行计费,从而纠正了此前因价格差异导致的费用计算不准确问题,提升了计费的精确性和公平性。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| }); | ||
|
|
||
| it("应从 candidatesTokensDetails 提取 IMAGE modality tokens", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 326, | ||
| candidatesTokenCount: 2340, | ||
| candidatesTokensDetails: [ | ||
| { modality: "IMAGE", tokenCount: 2000 }, | ||
| { modality: "TEXT", tokenCount: 340 }, | ||
| ], | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.output_image_tokens).toBe(2000); | ||
| expect(result.usageMetrics?.output_tokens).toBe(340); | ||
| }); | ||
|
|
||
| it("应从 promptTokensDetails 提取 IMAGE modality tokens", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 886, | ||
| candidatesTokenCount: 500, | ||
| promptTokensDetails: [ | ||
| { modality: "TEXT", tokenCount: 326 }, | ||
| { modality: "IMAGE", tokenCount: 560 }, | ||
| ], | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.input_image_tokens).toBe(560); | ||
| expect(result.usageMetrics?.input_tokens).toBe(326); | ||
| }); | ||
|
|
||
| it("应正确解析混合输入输出的完整 usage", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 357, | ||
| candidatesTokenCount: 2100, | ||
| totalTokenCount: 2580, | ||
| promptTokensDetails: [ | ||
| { modality: "TEXT", tokenCount: 99 }, | ||
| { modality: "IMAGE", tokenCount: 258 }, | ||
| ], | ||
| candidatesTokensDetails: [{ modality: "IMAGE", tokenCount: 2000 }], | ||
| thoughtsTokenCount: 123, | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.input_tokens).toBe(99); | ||
| expect(result.usageMetrics?.input_image_tokens).toBe(258); | ||
| // output_tokens = (candidatesTokenCount - IMAGE详情) + thoughtsTokenCount | ||
| // = (2100 - 2000) + 123 = 223 | ||
| expect(result.usageMetrics?.output_tokens).toBe(223); | ||
| expect(result.usageMetrics?.output_image_tokens).toBe(2000); | ||
| }); | ||
|
|
||
| it("应处理只有 IMAGE modality 的 candidatesTokensDetails", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 100, | ||
| candidatesTokenCount: 2000, | ||
| candidatesTokensDetails: [{ modality: "IMAGE", tokenCount: 2000 }], | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.output_image_tokens).toBe(2000); | ||
| // candidatesTokenCount = 2000, IMAGE = 2000, 未分类 = 0 | ||
| expect(result.usageMetrics?.output_tokens).toBe(0); | ||
| }); | ||
|
|
||
| it("应计算 candidatesTokenCount 与 details 的差值作为未分类 TEXT", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 326, | ||
| candidatesTokenCount: 2340, | ||
| candidatesTokensDetails: [{ modality: "IMAGE", tokenCount: 2000 }], | ||
| thoughtsTokenCount: 337, | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| // 未分类 = 2340 - 2000 = 340 | ||
| // output_tokens = 340 + 337 (thoughts) = 677 | ||
| expect(result.usageMetrics?.output_tokens).toBe(677); | ||
| expect(result.usageMetrics?.output_image_tokens).toBe(2000); | ||
| }); | ||
|
|
||
| it("应处理缺失 candidatesTokensDetails 的情况(向后兼容)", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 1000, | ||
| candidatesTokenCount: 500, | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.output_tokens).toBe(500); | ||
| expect(result.usageMetrics?.output_image_tokens).toBeUndefined(); | ||
| expect(result.usageMetrics?.input_image_tokens).toBeUndefined(); | ||
| }); | ||
|
|
||
| it("应处理空的 candidatesTokensDetails 数组", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 1000, | ||
| candidatesTokenCount: 500, | ||
| candidatesTokensDetails: [], | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.output_tokens).toBe(500); | ||
| expect(result.usageMetrics?.output_image_tokens).toBeUndefined(); | ||
| }); | ||
|
|
||
| it("应处理 candidatesTokensDetails 中无效 tokenCount 的情况", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 1000, | ||
| candidatesTokenCount: 500, | ||
| candidatesTokensDetails: [ | ||
| { modality: "TEXT" }, | ||
| { modality: "IMAGE", tokenCount: null }, | ||
| { modality: "TEXT", tokenCount: -1 }, | ||
| ], | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| // 无效数据不应覆盖原始 candidatesTokenCount | ||
| expect(result.usageMetrics?.output_tokens).toBe(500); | ||
| expect(result.usageMetrics?.output_image_tokens).toBeUndefined(); | ||
| }); | ||
|
|
||
| it("应处理 modality 大小写变体", () => { | ||
| const response = JSON.stringify({ | ||
| usageMetadata: { | ||
| promptTokenCount: 100, | ||
| candidatesTokenCount: 2340, | ||
| candidatesTokensDetails: [ | ||
| { modality: "image", tokenCount: 2000 }, | ||
| { modality: "Image", tokenCount: 100 }, | ||
| { modality: "TEXT", tokenCount: 240 }, | ||
| ], | ||
| }, | ||
| }); | ||
|
|
||
| const result = parseUsageFromResponseText(response, "gemini"); | ||
|
|
||
| expect(result.usageMetrics?.output_image_tokens).toBe(2100); | ||
| expect(result.usageMetrics?.output_tokens).toBe(240); | ||
| }); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Missing test: add coverage for cachedContentTokenCount + promptTokensDetails combination to verify cached tokens are properly deducted from text tokens
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/proxy/extract-usage-metrics.test.ts
Line: 343:509
Comment:
Missing test: add coverage for `cachedContentTokenCount` + `promptTokensDetails` combination to verify cached tokens are properly deducted from text tokens
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
When For responses with both cached tokens and image modality details, Consider adjusting line 1354 to handle cached tokens: Prompt To Fix With AIThis is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1278:1357
Comment:
`input_tokens` overwriting issue when both `promptTokenCount` and `promptTokensDetails` exist
When `promptTokensDetails` is present (line 1354), it overwrites the `input_tokens` calculated from `promptTokenCount - cachedContentTokenCount` (line 1281). This breaks the cache deduction logic.
For responses with both cached tokens and image modality details, `input_tokens` should equal `textTokens` from details, but cached tokens aren't deducted from the image tokens.
Consider adjusting line 1354 to handle cached tokens:
```suggestion
// Deduct cached tokens from text tokens if needed
const cachedTokens =
typeof usage.cachedContentTokenCount === "number" ? usage.cachedContentTokenCount : 0;
result.input_tokens = Math.max(textTokens - cachedTokens, 0);
```
How can I resolve this? If you propose a fix, please make it concise. |
There was a problem hiding this comment.
Code Review Summary
This PR correctly implements billing support for Gemini image generation models by extracting IMAGE modality tokens from candidatesTokensDetails and promptTokensDetails. The implementation is well-structured with proper fallback mechanisms and comprehensive test coverage.
PR Size: M
- Lines changed: 408 additions, 0 deletions
- Files changed: 5
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
No significant issues identified in this PR.
Review Coverage
- Logic and correctness - Clean: Extraction logic correctly parses
candidatesTokensDetailsandpromptTokensDetails, handles case-insensitive modality matching, and properly calculates unaccounted tokens - Security (OWASP Top 10) - Clean: No user input handling, type coercion is safe
- Error handling - Clean: Invalid token counts (null, negative, missing) are properly handled with
typeof ... === "number" && ... > 0guards - Type safety - Clean: Both
UsageMetricstype definitions updated consistently, newModelPriceDatafields added for image token pricing - Documentation accuracy - Clean: Comments accurately describe the billing calculation and fallback behavior
- Test coverage - Excellent: 10 new cost calculation tests + 12 new usage extraction tests covering all major scenarios including edge cases (empty arrays, invalid tokenCount, case variations)
- Code clarity - Good: Logic flow is clear, fallback behavior is well-documented
Key Observations
- Backward Compatibility: When
output_cost_per_image_tokenorinput_cost_per_image_tokenis not configured, the code correctly falls back to regular token prices - Ordering Logic: The new extraction code is correctly placed before the existing
output_tokenscheck, allowing Gemini-specific handling while preserving backward compatibility for other providers - Test Quality: Tests cover both happy paths and edge cases including invalid data handling and case-insensitive modality matching
Automated review by Claude AI
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/v1/_lib/proxy/response-handler.ts (1)
1294-1362: 避免 details 拆分后被 output_tokens 覆盖导致双计费当
candidatesTokensDetails已拆分且usage.output_tokens仍存在时,Line 1359 的覆盖会把「仅 TEXT」的output_tokens覆写为总量,从而与output_image_tokens叠加计费。建议仅在未从 details 生成output_tokens时才使用usage.output_tokens。建议修改
@@ - const candidatesDetails = usage.candidatesTokensDetails as + const candidatesDetails = usage.candidatesTokensDetails as | Array<{ modality?: string; tokenCount?: number }> | undefined; + let hasCandidatesDetails = false; if (Array.isArray(candidatesDetails) && candidatesDetails.length > 0) { @@ if (hasValidToken) { + hasCandidatesDetails = true; // 计算未分类的 TEXT tokens: candidatesTokenCount - details总和 // 这些可能是图片生成的内部开销,按 TEXT 价格计费 const detailsSum = imageTokens + textTokens; @@ } @@ - if (typeof usage.output_tokens === "number") { + if (typeof usage.output_tokens === "number" && !hasCandidatesDetails) { result.output_tokens = usage.output_tokens; hasAny = true; }
There was a problem hiding this comment.
Code Review
This pull request introduces support for handling image tokens in usage metrics and cost calculation, primarily for Gemini models. The changes involve adding input_image_tokens and output_image_tokens to the UsageMetrics and ModelPriceData types. The extractUsageMetrics function was updated to parse image and text tokens from candidatesTokensDetails and promptTokensDetails in Gemini responses, including logic to account for unclassified tokens. The calculateRequestCost function was modified to incorporate these new image token types, prioritizing specific image token prices (output_cost_per_image_token, input_cost_per_image_token) and falling back to general token prices if not specified. New unit tests were added to verify the correct extraction and cost calculation of image tokens. Review comments identified a bug in extractUsageMetrics where calculated output_tokens could be overwritten, suggested refactoring duplicated logic for processing candidatesTokensDetails and promptTokensDetails into a helper function, and pointed out a redundant test case in cost-calculation-image-tokens.test.ts that should be removed.
| if (Array.isArray(candidatesDetails) && candidatesDetails.length > 0) { | ||
| let imageTokens = 0; | ||
| let textTokens = 0; | ||
| let hasValidToken = false; | ||
| for (const detail of candidatesDetails) { | ||
| if (typeof detail.tokenCount === "number" && detail.tokenCount > 0) { | ||
| hasValidToken = true; | ||
| const modalityUpper = detail.modality?.toUpperCase(); | ||
| if (modalityUpper === "IMAGE") { | ||
| imageTokens += detail.tokenCount; | ||
| } else { | ||
| textTokens += detail.tokenCount; | ||
| } | ||
| } | ||
| } | ||
| if (imageTokens > 0) { | ||
| result.output_image_tokens = imageTokens; | ||
| hasAny = true; | ||
| } | ||
| if (hasValidToken) { | ||
| // 计算未分类的 TEXT tokens: candidatesTokenCount - details总和 | ||
| // 这些可能是图片生成的内部开销,按 TEXT 价格计费 | ||
| const detailsSum = imageTokens + textTokens; | ||
| const candidatesTotal = | ||
| typeof usage.candidatesTokenCount === "number" ? usage.candidatesTokenCount : 0; | ||
| const unaccountedTokens = Math.max(candidatesTotal - detailsSum, 0); | ||
| result.output_tokens = textTokens + unaccountedTokens; | ||
| hasAny = true; | ||
| } | ||
| } |
| if (Array.isArray(candidatesDetails) && candidatesDetails.length > 0) { | ||
| let imageTokens = 0; | ||
| let textTokens = 0; | ||
| let hasValidToken = false; | ||
| for (const detail of candidatesDetails) { | ||
| if (typeof detail.tokenCount === "number" && detail.tokenCount > 0) { | ||
| hasValidToken = true; | ||
| const modalityUpper = detail.modality?.toUpperCase(); | ||
| if (modalityUpper === "IMAGE") { | ||
| imageTokens += detail.tokenCount; | ||
| } else { | ||
| textTokens += detail.tokenCount; | ||
| } | ||
| } | ||
| } | ||
| if (imageTokens > 0) { | ||
| result.output_image_tokens = imageTokens; | ||
| hasAny = true; | ||
| } | ||
| if (hasValidToken) { | ||
| // 计算未分类的 TEXT tokens: candidatesTokenCount - details总和 | ||
| // 这些可能是图片生成的内部开销,按 TEXT 价格计费 | ||
| const detailsSum = imageTokens + textTokens; | ||
| const candidatesTotal = | ||
| typeof usage.candidatesTokenCount === "number" ? usage.candidatesTokenCount : 0; | ||
| const unaccountedTokens = Math.max(candidatesTotal - detailsSum, 0); | ||
| result.output_tokens = textTokens + unaccountedTokens; | ||
| hasAny = true; | ||
| } | ||
| } | ||
|
|
||
| // promptTokensDetails: 输入 token 按 modality 分类 | ||
| const promptDetails = usage.promptTokensDetails as | ||
| | Array<{ modality?: string; tokenCount?: number }> | ||
| | undefined; | ||
| if (Array.isArray(promptDetails) && promptDetails.length > 0) { | ||
| let imageTokens = 0; | ||
| let textTokens = 0; | ||
| let hasValidToken = false; | ||
| for (const detail of promptDetails) { | ||
| if (typeof detail.tokenCount === "number" && detail.tokenCount > 0) { | ||
| hasValidToken = true; | ||
| const modalityUpper = detail.modality?.toUpperCase(); | ||
| if (modalityUpper === "IMAGE") { | ||
| imageTokens += detail.tokenCount; | ||
| } else { | ||
| textTokens += detail.tokenCount; | ||
| } | ||
| } | ||
| } | ||
| if (imageTokens > 0) { | ||
| result.input_image_tokens = imageTokens; | ||
| hasAny = true; | ||
| } | ||
| if (hasValidToken) { | ||
| result.input_tokens = textTokens; | ||
| hasAny = true; | ||
| } | ||
| } |
| test("完整 Gemini image 响应计费示例", () => { | ||
| const cost = calculateRequestCost( | ||
| { | ||
| input_tokens: 326, | ||
| output_tokens: 340, | ||
| output_image_tokens: 2000, | ||
| }, | ||
| { | ||
| input_cost_per_token: 0.000002, | ||
| output_cost_per_token: 0.000012, | ||
| output_cost_per_image_token: 0.00012, | ||
| } | ||
| ); | ||
|
|
||
| // Google 官方价格验证 | ||
| // input: 326 * $0.000002 = $0.000652 | ||
| // output text: 340 * $0.000012 = $0.00408 | ||
| // output image: 2000 * $0.00012 = $0.24 (4K image = 2000 tokens) | ||
| // total: $0.244732 | ||
| expect(cost.toNumber()).toBeCloseTo(0.244732, 6); | ||
| }); |
Summary
Fix billing calculation for Gemini image generation models (e.g.,
gemini-3-pro-image-preview). IMAGE modality tokens cost $0.00012/token, which is 10x more expensive than TEXT tokens at $0.000012/token. The previous implementation did not parse modality-specific token details, resulting in approximately 7.6x undercharging.Problem
The system was treating all output tokens uniformly without distinguishing between IMAGE and TEXT modalities in Gemini's
candidatesTokensDetailsandpromptTokensDetailsresponse fields.Related Issues:
Solution
Extended type definitions (
src/types/model-price.ts)output_cost_per_image_tokenandinput_cost_per_image_tokenfieldsUpdated usage extraction (
src/app/v1/_lib/proxy/response-handler.ts)candidatesTokensDetailsto extractoutput_image_tokensandoutput_tokens(TEXT)promptTokensDetailsto extractinput_image_tokensandinput_tokens(TEXT)toUpperCase()candidatesTokenCountdifference as TEXTUpdated cost calculation (
src/lib/utils/cost-calculation.ts)output_image_tokensusesoutput_cost_per_image_tokenwhen availableoutput_cost_per_tokenfor backward compatibilityChanges
Core Changes
src/types/model-price.tssrc/app/v1/_lib/proxy/response-handler.tssrc/lib/utils/cost-calculation.tsTest Coverage
tests/unit/lib/cost-calculation-image-tokens.test.tstests/unit/proxy/extract-usage-metrics.test.tsBilling Example
Before fix: $0.244696 (undercharged by $0.00408)
Testing
Automated Tests
Manual Testing
gemini-3-pro-image-previewmodeloutput_cost_per_image_token: 0.00012in model pricingChecklist
bun run test)Description enhanced by Claude AI
Greptile Overview
Greptile Summary
This PR fixes a critical billing issue for Gemini image generation models by properly extracting and billing IMAGE modality tokens at 10x the rate of TEXT tokens ($0.00012 vs $0.000012).
Key Changes:
output_cost_per_image_tokenandinput_cost_per_image_tokenfieldscandidatesTokensDetailsandpromptTokensDetailsfrom Gemini responsesPotential Issue:
promptTokensDetailsis present along withcachedContentTokenCount, the cache deduction logic may not work correctly. The code at line 1354 overwritesinput_tokenswith rawtextTokensfrom details without deducting cached tokens that were previously subtracted at line 1281. This scenario lacks test coverage.Confidence Score: 3/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant ResponseHandler participant UsageExtractor participant CostCalculator participant PriceData Client->>ResponseHandler: Gemini API Response ResponseHandler->>UsageExtractor: extractUsageMetrics(response) UsageExtractor->>UsageExtractor: Parse usageMetadata alt Has candidatesTokensDetails UsageExtractor->>UsageExtractor: Iterate through candidatesTokensDetails UsageExtractor->>UsageExtractor: Filter IMAGE modality (case-insensitive) UsageExtractor->>UsageExtractor: Sum output_image_tokens UsageExtractor->>UsageExtractor: Calculate unaccounted TEXT tokens Note over UsageExtractor: output_tokens = textTokens + (candidatesTotal - detailsSum) end alt Has promptTokensDetails UsageExtractor->>UsageExtractor: Iterate through promptTokensDetails UsageExtractor->>UsageExtractor: Filter IMAGE modality (case-insensitive) UsageExtractor->>UsageExtractor: Sum input_image_tokens UsageExtractor->>UsageExtractor: Extract input_tokens (TEXT only) end UsageExtractor->>UsageExtractor: Add thoughtsTokenCount to output_tokens UsageExtractor-->>ResponseHandler: Return UsageMetrics ResponseHandler->>CostCalculator: calculateRequestCost(metrics, priceData) CostCalculator->>PriceData: Get output_cost_per_image_token alt Has output_cost_per_image_token CostCalculator->>CostCalculator: Calculate image token cost at $0.00012/token else Fallback CostCalculator->>PriceData: Use output_cost_per_token CostCalculator->>CostCalculator: Calculate at TEXT token rate end CostCalculator->>PriceData: Get input_cost_per_image_token alt Has input_cost_per_image_token CostCalculator->>CostCalculator: Calculate input image cost else Fallback CostCalculator->>PriceData: Use input_cost_per_token end CostCalculator->>CostCalculator: Sum all segments + apply multiplier CostCalculator-->>ResponseHandler: Return total cost ResponseHandler-->>Client: Billing Record