Skip to content

Comments

fix: 修复 leaderboard 输出速率计算除以过小值的问题#497

Merged
ding113 merged 1 commit intoding113:devfrom
NieiR:fix/leaderboard-output-rate-calc-v2
Jan 1, 2026
Merged

fix: 修复 leaderboard 输出速率计算除以过小值的问题#497
ding113 merged 1 commit intoding113:devfrom
NieiR:fix/leaderboard-output-rate-calc-v2

Conversation

@NieiR
Copy link
Contributor

@NieiR NieiR commented Jan 1, 2026

问题描述 / Problem Description

供应商排行榜的「平均输出速率」计算在某些情况下会出现异常高的值(如 63570.4 tok/s),原因是当 durationMs - ttfbMs 值非常小时,除法运算会产生极大的结果。

The provider leaderboard's "Average Output Rate" displays abnormally high values (e.g., 63570.4 tok/s) when durationMs - ttfbMs is very small, causing division by near-zero values.

Related to: #442 (leaderboard functionality enhancement)
Supersedes: #496 (closed - mixed unrelated features)

修复方案 / Solution

  1. 添加条件确保 (durationMs - ttfbMs) >= 100ms,过滤掉响应时间过短的异常数据

  2. 移除 NULLIF 包装,因为有了最小值检查后不再需要

  3. Add minimum threshold: (durationMs - ttfbMs) >= 100ms to filter out anomalous short-duration responses

  4. Remove redundant NULLIF wrapper since minimum value check prevents division by zero

Changes

Core Change:

  • src/repository/leaderboard.ts (+2/-1): Fixed output rate calculation with 100ms minimum threshold

修复前后对比 / Before/After Comparison

修复前(#4 供应商显示 63570.4 tok/s 异常值)

修复前

修复后(#4 供应商显示 58.6 tok/s 正常值)

修复后

Testing / 测试

  • 本地验证排行榜数据显示正常
  • Local verification of leaderboard data display

Checklist

  • Code follows project conventions
  • Self-review completed
  • Tests pass locally

Description enhanced by Claude AI

- 添加条件确保 (durationMs - ttfbMs) >= 100ms
- 移除 NULLIF 因为有了最小值检查后不再需要
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NieiR, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决排行榜中平均输出速率计算不准确的问题。当响应持续时间过短时,计算结果会异常偏高。通过引入最小持续时间阈值并简化除法逻辑,确保了排行榜数据的准确性和合理性。

Highlights

  • 异常值过滤: 在计算平均输出速率时,增加了条件 (durationMs - ttfbMs) >= 100ms,以过滤掉响应时间过短导致计算结果异常高的数据。
  • 移除 NULLIF: 由于引入了最小持续时间检查,不再需要 NULLIF 函数来避免除以零的问题,因此将其移除。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 很好地解决了在计算供应商排行榜的平均输出速率时,因除以一个过小的值(durationMs - ttfbMs)而导致结果异常高的问题。

通过增加一个 (durationMs - ttfbMs) >= 100 的条件,可以有效过滤掉这些会产生异常值的短时请求数据。同时,由于这个新条件保证了除数不会为零或过小,移除 NULLIF 也是正确的。

代码改动是有效的。我只提了一个关于代码可维护性的建议,即将硬编码的数字 100 提取为常量,以提高代码的可读性和未来的可维护性。

AND ${messageRequest.durationMs} IS NOT NULL
AND ${messageRequest.ttfbMs} IS NOT NULL
AND ${messageRequest.ttfbMs} < ${messageRequest.durationMs}
AND (${messageRequest.durationMs} - ${messageRequest.ttfbMs}) >= 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

为了提高代码的可读性和可维护性,建议将 100 这个魔法数字(magic number)提取为一个有意义的常量,例如 MIN_GENERATION_TIME_MS_FOR_RATE_CALC

这样做的好处是:

  1. 可读性:常量名可以清晰地表达这个数字的业务含义(用于速率计算的最小生成时间)。
  2. 可维护性:如果将来需要调整这个阈值,只需修改一处常量定义即可,避免在代码中搜索硬编码的数字。

你可以考虑在函数开头或文件顶部定义这个常量。

@ding113 ding113 changed the base branch from main to dev January 1, 2026 09:40
@github-actions github-actions bot added bug Something isn't working area:statistics size/XS Extra Small PR (< 50 lines) labels Jan 1, 2026
@ding113 ding113 merged commit 96deb3f into ding113:dev Jan 1, 2026
13 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Jan 1, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR fixes an edge case where extremely short response times (durationMs - ttfbMs < 100ms) were causing abnormally high token/second calculations in the leaderboard. The fix adds a 100ms minimum threshold filter.

PR Size: XS

  • Lines changed: 3
  • Files changed: 1

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 1 0
Tests 0 0 0 0
Simplification 0 0 0 0

Medium Priority Issues (Consider Fixing)

1. Inconsistent calculation logic between SQL and TypeScript (Medium)

The SQL query now filters out requests with generation time < 100ms, but the TypeScript function calculateOutputRate() in src/lib/utils/performance-formatter.ts:23-34 has no such threshold. This creates inconsistent behavior:

  • Individual request rate calculations (TypeScript): No 100ms minimum
  • Aggregated leaderboard statistics (SQL): 100ms minimum filter

Recommendation: Either add the same 100ms threshold to the TypeScript function, or add a comment explaining why the aggregated statistics need stricter filtering than individual calculations.

2. Magic number without documentation (Medium)

The 100ms threshold at line 341 is a magic number with no comment explaining:

  • Why 100ms was chosen as the threshold
  • What problem it solves (preventing division by near-zero values)
  • Whether this threshold might need adjustment in the future

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean (removed redundant NULLIF)
  • Type safety - Clean
  • Documentation accuracy - See issues above
  • Test coverage - No tests required for threshold change
  • Code clarity - Good

Notes

The core fix is sound: filtering out extremely short response times prevents statistical outliers. The removal of NULLIF is correct since the new >= 100 condition already prevents zero division. The before/after screenshots demonstrate the fix effectively resolved the 63570.4 tok/s anomaly.


Automated review by Claude AI

AND ${messageRequest.durationMs} IS NOT NULL
AND ${messageRequest.ttfbMs} IS NOT NULL
AND ${messageRequest.ttfbMs} < ${messageRequest.durationMs}
AND (${messageRequest.durationMs} - ${messageRequest.ttfbMs}) >= 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] [COMMENT-INCOMPLETE] Magic number without documentation

Why this is a problem: The 100ms threshold is a critical business logic decision but lacks explanation. Future maintainers will not understand:

  • Why 100ms specifically (vs 50ms or 200ms)
  • What problem this solves (preventing statistical outliers from near-zero division)
  • Whether this value should be configurable or adjusted

Suggested fix: Add a SQL comment before line 341:

-- Filter out responses with generation time < 100ms to prevent
-- statistical outliers from extremely fast/cached responses causing
-- abnormally high tok/s calculations (e.g., 63570.4 tok/s)
AND (durationMs - ttfbMs) >= 100

AND ${messageRequest.durationMs} IS NOT NULL
AND ${messageRequest.ttfbMs} IS NOT NULL
AND ${messageRequest.ttfbMs} < ${messageRequest.durationMs}
AND (${messageRequest.durationMs} - ${messageRequest.ttfbMs}) >= 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] [LOGIC-INCONSISTENCY] Divergent calculation logic between SQL and TypeScript

Why this is a problem: The SQL query now filters out requests where generation time < 100ms, but the equivalent TypeScript function calculateOutputRate() in src/lib/utils/performance-formatter.ts:23-34 has no such threshold (it only checks <= 0). This creates inconsistent behavior:

SQL (aggregated leaderboard): Excludes requests with (durationMs - ttfbMs) < 100ms
TypeScript (individual calculations): Includes all requests where generationTimeMs > 0

This means:

  1. A request with 50ms generation time will show a rate in individual views
  2. The same request will be excluded from leaderboard averages
  3. Users may see different rates depending on which view they check

Suggested fix: Apply the same 100ms threshold in the TypeScript function:

export function calculateOutputRate(
  outputTokens: number | null,
  durationMs: number | null,
  ttfbMs: number | null
): number | null {
  if (outputTokens == null || outputTokens <= 0 || durationMs == null || durationMs <= 0) {
    return null;
  }
  const generationTimeMs = ttfbMs != null ? durationMs - ttfbMs : durationMs;
  // Filter out extremely short generation times to prevent statistical outliers
  if (generationTimeMs < 100) return null;
  return outputTokens / (generationTimeMs / 1000);
}

Alternative: If the inconsistency is intentional (e.g., aggregates need stricter filtering than individual calculations), add comments in both locations explaining the different thresholds and why.

@github-actions github-actions bot mentioned this pull request Jan 1, 2026
10 tasks
@NieiR NieiR deleted the fix/leaderboard-output-rate-calc-v2 branch January 6, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:statistics bug Something isn't working size/XS Extra Small PR (< 50 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants