Skip to content

fix(billing): 修复 Gemini 图片生成模型的 IMAGE modality token 计费问题#664

Merged
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-3-pro-image-preview-billing
Jan 28, 2026
Merged

fix(billing): 修复 Gemini 图片生成模型的 IMAGE modality token 计费问题#664
ding113 merged 1 commit intoding113:devfrom
sususu98:fix/gemini-3-pro-image-preview-billing

Conversation

@sususu98
Copy link
Contributor

@sususu98 sususu98 commented Jan 28, 2026

Summary

Fix billing calculation for Gemini image generation models (e.g., gemini-3-pro-image-preview). IMAGE modality tokens cost $0.00012/token, which is 10x more expensive than TEXT tokens at $0.000012/token. The previous implementation did not parse modality-specific token details, resulting in approximately 7.6x undercharging.

Problem

The system was treating all output tokens uniformly without distinguishing between IMAGE and TEXT modalities in Gemini's candidatesTokensDetails and promptTokensDetails response fields.

Related Issues:

Solution

  1. Extended type definitions (src/types/model-price.ts)

    • Added output_cost_per_image_token and input_cost_per_image_token fields
  2. Updated usage extraction (src/app/v1/_lib/proxy/response-handler.ts)

    • Parse candidatesTokensDetails to extract output_image_tokens and output_tokens (TEXT)
    • Parse promptTokensDetails to extract input_image_tokens and input_tokens (TEXT)
    • Case-insensitive modality matching via toUpperCase()
    • Calculate unaccounted tokens from candidatesTokenCount difference as TEXT
  3. Updated cost calculation (src/lib/utils/cost-calculation.ts)

    • output_image_tokens uses output_cost_per_image_token when available
    • Falls back to output_cost_per_token for backward compatibility

Changes

Core Changes

File Changes
src/types/model-price.ts Added 4 new image token price fields
src/app/v1/_lib/proxy/response-handler.ts Added modality parsing logic (+68 lines)
src/lib/utils/cost-calculation.ts Added image token cost calculation (+18 lines)

Test Coverage

File Tests
tests/unit/lib/cost-calculation-image-tokens.test.ts 10 new tests
tests/unit/proxy/extract-usage-metrics.test.ts 12 new Gemini image tests

Billing Example

Item Tokens Unit Price Cost
Input TEXT 326 $0.000002 $0.000652
Output TEXT 340+337 $0.000012 $0.008124
Output IMAGE 2000 $0.00012 $0.240000
Total - - $0.248776

Before fix: $0.244696 (undercharged by $0.00408)

Testing

Automated Tests

  • Unit tests added for image token cost calculation (10 tests)
  • Unit tests added for usage metrics extraction (12 tests)
  • All tests pass locally

Manual Testing

  1. Configure a provider with gemini-3-pro-image-preview model
  2. Set output_cost_per_image_token: 0.00012 in model pricing
  3. Send an image generation request
  4. Verify that IMAGE modality tokens are billed at the higher rate

Checklist

  • Code follows project conventions
  • Self-review completed
  • Tests pass locally (bun run test)
  • Backward compatible (falls back to standard token pricing if image pricing not configured)

Description enhanced by Claude AI

Greptile Overview

Greptile Summary

This PR fixes a critical billing issue for Gemini image generation models by properly extracting and billing IMAGE modality tokens at 10x the rate of TEXT tokens ($0.00012 vs $0.000012).

Key Changes:

  • Extended type system with output_cost_per_image_token and input_cost_per_image_token fields
  • Enhanced usage extraction to parse candidatesTokensDetails and promptTokensDetails from Gemini responses
  • Implemented case-insensitive modality matching and proper handling of unaccounted tokens
  • Updated cost calculation with graceful fallback to regular token pricing for backward compatibility
  • Added comprehensive test coverage (22 new tests across billing and extraction)

Potential Issue:

  • When promptTokensDetails is present along with cachedContentTokenCount, the cache deduction logic may not work correctly. The code at line 1354 overwrites input_tokens with raw textTokens from details without deducting cached tokens that were previously subtracted at line 1281. This scenario lacks test coverage.

Confidence Score: 3/5

  • Safe to merge with one logical issue that needs verification
  • The implementation correctly extracts modality tokens and calculates costs for the primary use case. However, there's a potential cache + image token interaction bug that could cause incorrect billing when both features are used together. The issue is uncommon but should be addressed.
  • src/app/v1/_lib/proxy/response-handler.ts requires attention for the cache + image token interaction at lines 1278-1357

Important Files Changed

Filename Overview
src/types/model-price.ts Added optional fields for image token pricing with clear comments
src/lib/utils/cost-calculation.ts Properly calculates image token costs with fallback to regular token pricing
src/app/v1/_lib/proxy/response-handler.ts Extracts modality-specific tokens but may have cache interaction issue

Sequence Diagram

sequenceDiagram
    participant Client
    participant ResponseHandler
    participant UsageExtractor
    participant CostCalculator
    participant PriceData

    Client->>ResponseHandler: Gemini API Response
    ResponseHandler->>UsageExtractor: extractUsageMetrics(response)
    
    UsageExtractor->>UsageExtractor: Parse usageMetadata
    
    alt Has candidatesTokensDetails
        UsageExtractor->>UsageExtractor: Iterate through candidatesTokensDetails
        UsageExtractor->>UsageExtractor: Filter IMAGE modality (case-insensitive)
        UsageExtractor->>UsageExtractor: Sum output_image_tokens
        UsageExtractor->>UsageExtractor: Calculate unaccounted TEXT tokens
        Note over UsageExtractor: output_tokens = textTokens + (candidatesTotal - detailsSum)
    end
    
    alt Has promptTokensDetails
        UsageExtractor->>UsageExtractor: Iterate through promptTokensDetails
        UsageExtractor->>UsageExtractor: Filter IMAGE modality (case-insensitive)
        UsageExtractor->>UsageExtractor: Sum input_image_tokens
        UsageExtractor->>UsageExtractor: Extract input_tokens (TEXT only)
    end
    
    UsageExtractor->>UsageExtractor: Add thoughtsTokenCount to output_tokens
    UsageExtractor-->>ResponseHandler: Return UsageMetrics
    
    ResponseHandler->>CostCalculator: calculateRequestCost(metrics, priceData)
    
    CostCalculator->>PriceData: Get output_cost_per_image_token
    alt Has output_cost_per_image_token
        CostCalculator->>CostCalculator: Calculate image token cost at $0.00012/token
    else Fallback
        CostCalculator->>PriceData: Use output_cost_per_token
        CostCalculator->>CostCalculator: Calculate at TEXT token rate
    end
    
    CostCalculator->>PriceData: Get input_cost_per_image_token
    alt Has input_cost_per_image_token
        CostCalculator->>CostCalculator: Calculate input image cost
    else Fallback
        CostCalculator->>PriceData: Use input_cost_per_token
    end
    
    CostCalculator->>CostCalculator: Sum all segments + apply multiplier
    CostCalculator-->>ResponseHandler: Return total cost
    ResponseHandler-->>Client: Billing Record
Loading

问题背景:
- gemini-3-pro-image-preview 等图片生成模型返回的 usage 中包含 candidatesTokensDetails
- 该字段按 modality 细分 token (IMAGE/TEXT)
- IMAGE modality token 价格为 $0.00012/token,是普通 TEXT token 的 10 倍
- 原系统未解析此字段,导致 IMAGE token 按 TEXT 价格计费,计费偏低约 7.6 倍

类型扩展 (src/types/model-price.ts):
- 新增 output_cost_per_image_token: 输出图片 token 单价 (按 token 计费)
- 新增 input_cost_per_image_token: 输入图片 token 单价 (按 token 计费)
- 保留 input_cost_per_image: 输入图片固定价格 (按张计费,$0.0011/张)
- 保留 output_cost_per_image: 输出图片固定价格 (按张计费)

Usage 提取逻辑 (src/app/v1/_lib/proxy/response-handler.ts):
- 解析 candidatesTokensDetails 提取 output_image_tokens 和 output_tokens (TEXT)
- 解析 promptTokensDetails 提取 input_image_tokens 和 input_tokens (TEXT)
- 使用 toUpperCase() 进行大小写不敏感匹配 (IMAGE/image/Image)
- 添加 hasValidToken 守卫,仅在解析到有效 token 时覆盖原始值
- 修复 promptTokensDetails 解析不完整导致 input IMAGE tokens 被重复计费的问题
- 计算 candidatesTokenCount 与 details 总和的差值作为未分类 TEXT tokens
  (这些是图片生成的内部开销,按 TEXT 价格计费)

计费逻辑 (src/lib/utils/cost-calculation.ts):
- output_image_tokens 优先使用 output_cost_per_image_token 计费
- input_image_tokens 优先使用 input_cost_per_image_token 计费
- 若未配置 image token 价格,回退到普通 token 价格 (向后兼容)
- 倍率 (multiplier) 同时作用于 image token 费用

测试覆盖:
- 新增 cost-calculation-image-tokens.test.ts (10 个测试)
- 扩展 extract-usage-metrics.test.ts (12 个 Gemini image 测试)
- 覆盖场景: 纯 IMAGE、IMAGE+TEXT 混合、无效数据、大小写变体、向后兼容、
  混合输入输出、candidatesTokenCount 差值计算

计费示例 (完整图片生成请求):
- promptTokenCount=326, candidatesTokenCount=2340, thoughtsTokenCount=337
- candidatesTokensDetails: IMAGE=2000 (差值 340 为未分类 TEXT)
- 输入 TEXT: 326 × $0.000002 = $0.000652
- 输出 TEXT: (340+337) × $0.000012 = $0.008124
- 输出 IMAGE: 2000 × $0.00012 = $0.240000
- 总计: $0.248776 (修复前 $0.244696,少收 $0.00408)

Fixes ding113#663
@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

此 PR 通过向 UsageMetrics 添加图像令牌字段、在 Gemini/OpenAI 响应解析中提取模态特定令牌计数、更新定价模型以支持图像特定成本,并添加全面的测试覆盖来扩展多模态令牌跟踪。

Changes

内容 / 文件 变更摘要
类型系统更新
src/types/model-price.ts, src/app/v1/_lib/proxy/response-handler.ts, src/lib/utils/cost-calculation.ts
在 ModelPriceData 中添加 output_cost_per_image、output_cost_per_image_token、input_cost_per_image、input_cost_per_image_token 字段;在 UsageMetrics 中添加 input_image_tokens 和 output_image_tokens 可选字段
响应解析 - 模态令牌提取
src/app/v1/_lib/proxy/response-handler.ts
实现模态感知的令牌提取逻辑,从 candidatesTokensDetails 和 promptTokensDetails 中区分 IMAGE 和 TEXT 模态,分别计算和设置对应的输入/输出图像令牌计数
成本计算 - 图像令牌处理
src/lib/utils/cost-calculation.ts
在 calculateRequestCost 中添加图像令牌成本计算,使用 output_cost_per_image_token 和 input_cost_per_image_token 字段,包含对传统单位价格的回退逻辑
测试覆盖 - 图像令牌成本
tests/unit/lib/cost-calculation-image-tokens.test.ts
新增 152 行单元测试,验证图像令牌定价、成本回退、乘数应用及混合场景
测试覆盖 - 使用指标提取
tests/unit/proxy/extract-usage-metrics.test.ts
新增 164 行 Gemini 特定测试用例,验证从 candidatesTokensDetails 和 promptTokensDetails 中提取 IMAGE 模态令牌的完整控制流

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰准确地总结了主要变更:修复 Gemini 图片生成模型的 IMAGE modality token 计费问题,反映了 PR 的核心目的。
Description check ✅ Passed PR 描述详细说明了修复内容、涉及的文件、计费逻辑变更,以及测试覆盖情况,与代码变更高度相关。
Linked Issues check ✅ Passed PR 完整实现了 #663 的所有要求:扩展 UsageMetrics 类型以包含 input/output_image_tokens [#663]、在 response-handler.ts 中解析 candidatesTokensDetails 和 promptTokensDetails 提取 IMAGE token [#663]、在 cost-calculation.ts 中应用 image token 单价计费 [#663],并包含全面的测试覆盖。
Out of Scope Changes check ✅ Passed 所有代码变更均与 #663 的计费修复需求紧密相关,没有发现超出范围的修改。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sususu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决 Gemini 图片生成模型在计费时未能正确识别和应用图片模态 token 价格的问题。通过引入专门的图片 token 计费字段和更新解析及计算逻辑,确保系统能够准确地对图片输入和输出的 token 进行计费,从而纠正了此前因价格差异导致的费用计算不准确问题,提升了计费的精确性和公平性。

Highlights

  • 计费问题修复: 修复了 Gemini 图片生成模型 (如 gemini-3-pro-image-preview) 的 IMAGE 模态 token 计费问题,此前因未正确解析导致少收约 7.6 倍费用。
  • 价格模型扩展: 在 src/types/model-price.ts 中新增了 output_cost_per_image_tokeninput_cost_per_image_token 字段,用于定义图片 token 的单价。
  • Usage 提取逻辑更新: 修改了 src/app/v1/_lib/proxy/response-handler.ts 中的 extractUsageMetrics 函数,以解析 candidatesTokensDetailspromptTokensDetails,从而提取 output_image_tokensinput_image_tokens,并支持大小写不敏感的模态匹配,同时计算未分类的 TEXT token。
  • 计费逻辑调整: 更新了 src/lib/utils/cost-calculation.ts 中的计费逻辑,确保图片 token 优先使用其专属单价计费,若未配置则回退到普通 token 单价。
  • 测试覆盖增强: 新增了 tests/unit/lib/cost-calculation-image-tokens.test.ts 文件,包含 10 个测试用例,并扩展了 tests/unit/proxy/extract-usage-metrics.test.ts,增加了 12 个 Gemini 图片相关测试。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working area:Google Gemini area:core size/M Medium PR (< 500 lines) labels Jan 28, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 343 to 509
});

it("应从 candidatesTokensDetails 提取 IMAGE modality tokens", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 326,
candidatesTokenCount: 2340,
candidatesTokensDetails: [
{ modality: "IMAGE", tokenCount: 2000 },
{ modality: "TEXT", tokenCount: 340 },
],
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.output_image_tokens).toBe(2000);
expect(result.usageMetrics?.output_tokens).toBe(340);
});

it("应从 promptTokensDetails 提取 IMAGE modality tokens", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 886,
candidatesTokenCount: 500,
promptTokensDetails: [
{ modality: "TEXT", tokenCount: 326 },
{ modality: "IMAGE", tokenCount: 560 },
],
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.input_image_tokens).toBe(560);
expect(result.usageMetrics?.input_tokens).toBe(326);
});

it("应正确解析混合输入输出的完整 usage", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 357,
candidatesTokenCount: 2100,
totalTokenCount: 2580,
promptTokensDetails: [
{ modality: "TEXT", tokenCount: 99 },
{ modality: "IMAGE", tokenCount: 258 },
],
candidatesTokensDetails: [{ modality: "IMAGE", tokenCount: 2000 }],
thoughtsTokenCount: 123,
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.input_tokens).toBe(99);
expect(result.usageMetrics?.input_image_tokens).toBe(258);
// output_tokens = (candidatesTokenCount - IMAGE详情) + thoughtsTokenCount
// = (2100 - 2000) + 123 = 223
expect(result.usageMetrics?.output_tokens).toBe(223);
expect(result.usageMetrics?.output_image_tokens).toBe(2000);
});

it("应处理只有 IMAGE modality 的 candidatesTokensDetails", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 100,
candidatesTokenCount: 2000,
candidatesTokensDetails: [{ modality: "IMAGE", tokenCount: 2000 }],
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.output_image_tokens).toBe(2000);
// candidatesTokenCount = 2000, IMAGE = 2000, 未分类 = 0
expect(result.usageMetrics?.output_tokens).toBe(0);
});

it("应计算 candidatesTokenCount 与 details 的差值作为未分类 TEXT", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 326,
candidatesTokenCount: 2340,
candidatesTokensDetails: [{ modality: "IMAGE", tokenCount: 2000 }],
thoughtsTokenCount: 337,
},
});

const result = parseUsageFromResponseText(response, "gemini");

// 未分类 = 2340 - 2000 = 340
// output_tokens = 340 + 337 (thoughts) = 677
expect(result.usageMetrics?.output_tokens).toBe(677);
expect(result.usageMetrics?.output_image_tokens).toBe(2000);
});

it("应处理缺失 candidatesTokensDetails 的情况(向后兼容)", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 1000,
candidatesTokenCount: 500,
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.output_tokens).toBe(500);
expect(result.usageMetrics?.output_image_tokens).toBeUndefined();
expect(result.usageMetrics?.input_image_tokens).toBeUndefined();
});

it("应处理空的 candidatesTokensDetails 数组", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 1000,
candidatesTokenCount: 500,
candidatesTokensDetails: [],
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.output_tokens).toBe(500);
expect(result.usageMetrics?.output_image_tokens).toBeUndefined();
});

it("应处理 candidatesTokensDetails 中无效 tokenCount 的情况", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 1000,
candidatesTokenCount: 500,
candidatesTokensDetails: [
{ modality: "TEXT" },
{ modality: "IMAGE", tokenCount: null },
{ modality: "TEXT", tokenCount: -1 },
],
},
});

const result = parseUsageFromResponseText(response, "gemini");

// 无效数据不应覆盖原始 candidatesTokenCount
expect(result.usageMetrics?.output_tokens).toBe(500);
expect(result.usageMetrics?.output_image_tokens).toBeUndefined();
});

it("应处理 modality 大小写变体", () => {
const response = JSON.stringify({
usageMetadata: {
promptTokenCount: 100,
candidatesTokenCount: 2340,
candidatesTokensDetails: [
{ modality: "image", tokenCount: 2000 },
{ modality: "Image", tokenCount: 100 },
{ modality: "TEXT", tokenCount: 240 },
],
},
});

const result = parseUsageFromResponseText(response, "gemini");

expect(result.usageMetrics?.output_image_tokens).toBe(2100);
expect(result.usageMetrics?.output_tokens).toBe(240);
});
});

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test: add coverage for cachedContentTokenCount + promptTokensDetails combination to verify cached tokens are properly deducted from text tokens

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/proxy/extract-usage-metrics.test.ts
Line: 343:509

Comment:
Missing test: add coverage for `cachedContentTokenCount` + `promptTokensDetails` combination to verify cached tokens are properly deducted from text tokens

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Jan 28, 2026

Additional Comments (1)

src/app/v1/_lib/proxy/response-handler.ts
input_tokens overwriting issue when both promptTokenCount and promptTokensDetails exist

When promptTokensDetails is present (line 1354), it overwrites the input_tokens calculated from promptTokenCount - cachedContentTokenCount (line 1281). This breaks the cache deduction logic.

For responses with both cached tokens and image modality details, input_tokens should equal textTokens from details, but cached tokens aren't deducted from the image tokens.

Consider adjusting line 1354 to handle cached tokens:

      // Deduct cached tokens from text tokens if needed
      const cachedTokens =
        typeof usage.cachedContentTokenCount === "number" ? usage.cachedContentTokenCount : 0;
      result.input_tokens = Math.max(textTokens - cachedTokens, 0);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1278:1357

Comment:
`input_tokens` overwriting issue when both `promptTokenCount` and `promptTokensDetails` exist

When `promptTokensDetails` is present (line 1354), it overwrites the `input_tokens` calculated from `promptTokenCount - cachedContentTokenCount` (line 1281). This breaks the cache deduction logic.

For responses with both cached tokens and image modality details, `input_tokens` should equal `textTokens` from details, but cached tokens aren't deducted from the image tokens.

Consider adjusting line 1354 to handle cached tokens:
```suggestion
      // Deduct cached tokens from text tokens if needed
      const cachedTokens =
        typeof usage.cachedContentTokenCount === "number" ? usage.cachedContentTokenCount : 0;
      result.input_tokens = Math.max(textTokens - cachedTokens, 0);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR correctly implements billing support for Gemini image generation models by extracting IMAGE modality tokens from candidatesTokensDetails and promptTokensDetails. The implementation is well-structured with proper fallback mechanisms and comprehensive test coverage.

PR Size: M

  • Lines changed: 408 additions, 0 deletions
  • Files changed: 5

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

No significant issues identified in this PR.

Review Coverage

  • Logic and correctness - Clean: Extraction logic correctly parses candidatesTokensDetails and promptTokensDetails, handles case-insensitive modality matching, and properly calculates unaccounted tokens
  • Security (OWASP Top 10) - Clean: No user input handling, type coercion is safe
  • Error handling - Clean: Invalid token counts (null, negative, missing) are properly handled with typeof ... === "number" && ... > 0 guards
  • Type safety - Clean: Both UsageMetrics type definitions updated consistently, new ModelPriceData fields added for image token pricing
  • Documentation accuracy - Clean: Comments accurately describe the billing calculation and fallback behavior
  • Test coverage - Excellent: 10 new cost calculation tests + 12 new usage extraction tests covering all major scenarios including edge cases (empty arrays, invalid tokenCount, case variations)
  • Code clarity - Good: Logic flow is clear, fallback behavior is well-documented

Key Observations

  1. Backward Compatibility: When output_cost_per_image_token or input_cost_per_image_token is not configured, the code correctly falls back to regular token prices
  2. Ordering Logic: The new extraction code is correctly placed before the existing output_tokens check, allowing Gemini-specific handling while preserving backward compatibility for other providers
  3. Test Quality: Tests cover both happy paths and edge cases including invalid data handling and case-insensitive modality matching

Automated review by Claude AI

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/v1/_lib/proxy/response-handler.ts (1)

1294-1362: 避免 details 拆分后被 output_tokens 覆盖导致双计费

candidatesTokensDetails 已拆分且 usage.output_tokens 仍存在时,Line 1359 的覆盖会把「仅 TEXT」的 output_tokens 覆写为总量,从而与 output_image_tokens 叠加计费。建议仅在未从 details 生成 output_tokens 时才使用 usage.output_tokens

建议修改
@@
-  const candidatesDetails = usage.candidatesTokensDetails as
+  const candidatesDetails = usage.candidatesTokensDetails as
     | Array<{ modality?: string; tokenCount?: number }>
     | undefined;
+  let hasCandidatesDetails = false;
   if (Array.isArray(candidatesDetails) && candidatesDetails.length > 0) {
@@
     if (hasValidToken) {
+      hasCandidatesDetails = true;
       // 计算未分类的 TEXT tokens: candidatesTokenCount - details总和
       // 这些可能是图片生成的内部开销,按 TEXT 价格计费
       const detailsSum = imageTokens + textTokens;
@@
   }
@@
-  if (typeof usage.output_tokens === "number") {
+  if (typeof usage.output_tokens === "number" && !hasCandidatesDetails) {
     result.output_tokens = usage.output_tokens;
     hasAny = true;
   }

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for handling image tokens in usage metrics and cost calculation, primarily for Gemini models. The changes involve adding input_image_tokens and output_image_tokens to the UsageMetrics and ModelPriceData types. The extractUsageMetrics function was updated to parse image and text tokens from candidatesTokensDetails and promptTokensDetails in Gemini responses, including logic to account for unclassified tokens. The calculateRequestCost function was modified to incorporate these new image token types, prioritizing specific image token prices (output_cost_per_image_token, input_cost_per_image_token) and falling back to general token prices if not specified. New unit tests were added to verify the correct extraction and cost calculation of image tokens. Review comments identified a bug in extractUsageMetrics where calculated output_tokens could be overwritten, suggested refactoring duplicated logic for processing candidatesTokensDetails and promptTokensDetails into a helper function, and pointed out a redundant test case in cost-calculation-image-tokens.test.ts that should be removed.

Comment on lines +1299 to +1328
if (Array.isArray(candidatesDetails) && candidatesDetails.length > 0) {
let imageTokens = 0;
let textTokens = 0;
let hasValidToken = false;
for (const detail of candidatesDetails) {
if (typeof detail.tokenCount === "number" && detail.tokenCount > 0) {
hasValidToken = true;
const modalityUpper = detail.modality?.toUpperCase();
if (modalityUpper === "IMAGE") {
imageTokens += detail.tokenCount;
} else {
textTokens += detail.tokenCount;
}
}
}
if (imageTokens > 0) {
result.output_image_tokens = imageTokens;
hasAny = true;
}
if (hasValidToken) {
// 计算未分类的 TEXT tokens: candidatesTokenCount - details总和
// 这些可能是图片生成的内部开销,按 TEXT 价格计费
const detailsSum = imageTokens + textTokens;
const candidatesTotal =
typeof usage.candidatesTokenCount === "number" ? usage.candidatesTokenCount : 0;
const unaccountedTokens = Math.max(candidatesTotal - detailsSum, 0);
result.output_tokens = textTokens + unaccountedTokens;
hasAny = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

这里从 candidatesTokensDetails 计算出的 output_tokens(在 1325 行)是正确的。但是,如果 usage 对象中也存在 output_tokens 字段,那么在 1359 行的现有逻辑会无条件地覆盖掉这里计算出的值。这是一个 Bug,会导致计费错误。1359 行的逻辑应该只在 candidatesTokensDetails 不存在或无效时才执行。请调整代码以修复这个优先级问题。

Comment on lines +1299 to +1357
if (Array.isArray(candidatesDetails) && candidatesDetails.length > 0) {
let imageTokens = 0;
let textTokens = 0;
let hasValidToken = false;
for (const detail of candidatesDetails) {
if (typeof detail.tokenCount === "number" && detail.tokenCount > 0) {
hasValidToken = true;
const modalityUpper = detail.modality?.toUpperCase();
if (modalityUpper === "IMAGE") {
imageTokens += detail.tokenCount;
} else {
textTokens += detail.tokenCount;
}
}
}
if (imageTokens > 0) {
result.output_image_tokens = imageTokens;
hasAny = true;
}
if (hasValidToken) {
// 计算未分类的 TEXT tokens: candidatesTokenCount - details总和
// 这些可能是图片生成的内部开销,按 TEXT 价格计费
const detailsSum = imageTokens + textTokens;
const candidatesTotal =
typeof usage.candidatesTokenCount === "number" ? usage.candidatesTokenCount : 0;
const unaccountedTokens = Math.max(candidatesTotal - detailsSum, 0);
result.output_tokens = textTokens + unaccountedTokens;
hasAny = true;
}
}

// promptTokensDetails: 输入 token 按 modality 分类
const promptDetails = usage.promptTokensDetails as
| Array<{ modality?: string; tokenCount?: number }>
| undefined;
if (Array.isArray(promptDetails) && promptDetails.length > 0) {
let imageTokens = 0;
let textTokens = 0;
let hasValidToken = false;
for (const detail of promptDetails) {
if (typeof detail.tokenCount === "number" && detail.tokenCount > 0) {
hasValidToken = true;
const modalityUpper = detail.modality?.toUpperCase();
if (modalityUpper === "IMAGE") {
imageTokens += detail.tokenCount;
} else {
textTokens += detail.tokenCount;
}
}
}
if (imageTokens > 0) {
result.input_image_tokens = imageTokens;
hasAny = true;
}
if (hasValidToken) {
result.input_tokens = textTokens;
hasAny = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

处理 candidatesTokensDetails(1299-1328行)和 promptTokensDetails(1334-1357行)的逻辑非常相似,存在代码重复。这会增加未来的维护成本。建议将遍历 details 数组、累加 imageTokenstextTokens 的通用逻辑提取到一个独立的辅助函数中。

Comment on lines +76 to +96
test("完整 Gemini image 响应计费示例", () => {
const cost = calculateRequestCost(
{
input_tokens: 326,
output_tokens: 340,
output_image_tokens: 2000,
},
{
input_cost_per_token: 0.000002,
output_cost_per_token: 0.000012,
output_cost_per_image_token: 0.00012,
}
);

// Google 官方价格验证
// input: 326 * $0.000002 = $0.000652
// output text: 340 * $0.000012 = $0.00408
// output image: 2000 * $0.00012 = $0.24 (4K image = 2000 tokens)
// total: $0.244732
expect(cost.toNumber()).toBeCloseTo(0.244732, 6);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这个测试用例 “完整 Gemini image 响应计费示例” 与前面的 “混合响应:text + image tokens 应分别计费”(55-74行)完全相同。这是不必要的重复,请移除这个多余的测试用例。

@ding113 ding113 merged commit 704d00a into ding113:dev Jan 28, 2026
17 of 19 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:Google Gemini bug Something isn't working size/M Medium PR (< 500 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants