Skip to content

Comments

fix: bound SessionTracker active_sessions zsets by env TTL#718

Merged
ding113 merged 3 commits intodevfrom
fix/issue-714-sessiontracker-zset-leak
Feb 6, 2026
Merged

fix: bound SessionTracker active_sessions zsets by env TTL#718
ding113 merged 3 commits intodevfrom
fix/issue-714-sessiontracker-zset-leak

Conversation

@ding113
Copy link
Owner

@ding113 ding113 commented Feb 3, 2026

Summary

  • Align SessionTracker ZSET cleanup + session binding TTL refresh with env SESSION_TTL (seconds; default 300)
  • Ensure provider:*:active_sessions gets a fallback EXPIRE 3600 on the hot write path (SessionTracker.refreshSession) to avoid TTL=-1 keys
  • Add low-overhead write-path stale member cleanup (1% probabilistic ZREMRANGEBYSCORE in refreshSession)
  • Align provider concurrency Lua cleanup window with env SESSION_TTL by passing ttlMs into CHECK_AND_TRACK_SESSION (backward compatible: defaults to 300000ms when omitted)

Testing

  • bun run lint
  • bun run typecheck
  • bun run test
  • bun run build

Closes #714

Greptile Overview

Greptile Summary

  • Plumbs env-derived SESSION_TTL into both SessionTracker and the provider concurrency Lua script so cleanup windows and binding TTL refreshes match configuration.
  • Updates CHECK_AND_TRACK_SESSION to accept optional ttlMs, guard invalid TTL, and set provider ZSET expiry based on max(3600, ttl_seconds).
  • Adds a low-overhead probabilistic write-path cleanup for provider active session ZSETs.
  • Introduces unit tests covering TTL parsing/validation, expiry behavior, and passing ttlMs into Lua.

Confidence Score: 3/5

  • Moderately safe to merge, but verify active_sessions ZSETs cannot be left without TTL on the refresh hot path.
  • Core TTL parsing/propagation looks consistent and covered by tests, and the Lua guard prevents catastrophic cleanup on invalid TTL. Main remaining concern is that refreshSession only expires the provider active-sessions ZSET, which can contradict the stated goal of preventing TTL=-1 leaks for other active-sessions keys depending on call paths.
  • src/lib/session-tracker.ts

Important Files Changed

Filename Overview
src/lib/rate-limit/service.ts Passes env-derived SESSION_TTL_MS into CHECK_AND_TRACK_SESSION Lua script as ARGV[4] to align provider concurrency cleanup window with session TTL.
src/lib/redis/lua-scripts.ts CHECK_AND_TRACK_SESSION now accepts optional ttlMs (ARGV[4]), guards invalid TTL, and sets EXPIRE based on max(3600, ttl_seconds) instead of a fixed 1h.
src/lib/session-tracker.ts Uses env SESSION_TTL for binding TTL refresh and adds probabilistic provider ZSET cleanup + provider ZSET EXPIRE; verify other active_sessions ZSETs cannot end up TTL=-1 via refresh path.
tests/unit/lib/rate-limit/service-extra.test.ts Adds unit test asserting RateLimitService.checkAndTrackProviderSession passes ttlMs as ARGV[4] to Redis eval.
tests/unit/lib/session-tracker-cleanup.test.ts New tests for env-driven SESSION_TTL cutoff calculations, provider ZSET expiry behavior, probabilistic write-path cleanup, and fail-open behavior.
tests/unit/lib/session-ttl-validation.test.ts New tests verifying invalid SESSION_TTL values fall back to default 300s and valid values are honored.

Sequence Diagram

sequenceDiagram
  autonumber
  participant App as Request handler
  participant ST as SessionTracker
  participant R as Redis
  participant RLS as RateLimitService
  participant Lua as CHECK_AND_TRACK_SESSION (Lua)

  App->>RLS: checkAndTrackProviderSession(providerId, sessionId, limit)
  RLS->>R: EVAL(script, key, sessionId, limit, nowMs, ttlMs)
  R->>Lua: run script
  Lua->>R: ZREMRANGEBYSCORE(providerZset, -inf, now-ttlMs)
  Lua->>R: ZSCORE / ZCARD
  Lua->>R: ZADD(providerZset, now, sessionId)
  Lua->>R: EXPIRE(providerZset, max(3600, ttlMs/1000))
  Lua-->>RLS: {allowed, count, tracked}
  RLS-->>App: allow/deny

  App->>ST: refreshSession(sessionId, keyId, providerId, userId?)
  ST->>R: PIPELINE zadd global/key/provider/user ZSETs
  ST->>R: PIPELINE expire providerZset
  ST->>R: PIPELINE expire session bindings + setex last_seen
  alt probabilistic cleanup (1%)
    ST->>R: PIPELINE zremrangebyscore(providerZset, -inf, now-ttlMs)
  end
  R-->>ST: pipeline exec results
Loading

ding113 and others added 2 commits February 4, 2026 06:56
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the session management within the SessionTracker by introducing environment-driven Time-To-Live (TTL) configurations for Redis keys and optimizing the session cleanup process. The changes ensure that session expiration is consistently applied across various Redis data structures and that cleanup operations are performed efficiently, reducing overhead on critical write paths. This leads to more robust and configurable session handling.

Highlights

  • Configurable Session TTL: The active_sessions ZSETs in SessionTracker now utilize a configurable SESSION_TTL from the environment variable (defaulting to 300 seconds) instead of a hardcoded 3600 seconds, ensuring more flexible session expiration.
  • Optimized Write-Path Cleanup: Stale member cleanup has been removed from the hot write path (trackRequest) to improve performance. This cleanup now occurs exclusively within the cleanupExpiredSessions method, and probabilistically during refreshSession.
  • Lua Script TTL Parameter: The rateLimitWithSessionLua script now accepts sessionTtlMs as an argument, allowing the Lua script to properly manage the expiration of ZSET members based on the configured session TTL.
Changelog
  • src/lib/rate-limit/service.ts
    • Introduced SESSION_TTL_SECONDS and SESSION_TTL_MS constants, deriving their values from process.env.SESSION_TTL (defaulting to 300 seconds).
    • Modified the RateLimitService.redis.eval call for CHECK_AND_TRACK_SESSION to pass SESSION_TTL_MS as a new argument (ARGV[4]).
  • src/lib/redis/lua-scripts.ts
    • Updated comments within the CHECK_AND_TRACK_SESSION Lua script from Chinese to English for better clarity.
    • Modified the CHECK_AND_TRACK_SESSION Lua script to accept ttl as ARGV[4], with a default value of 300000ms if not provided.
    • Adjusted the cutoff calculation within the Lua script to use the new ttl argument for cleaning up expired sessions.
  • src/lib/session-tracker.ts
    • Replaced the hardcoded SESSION_TTL with SESSION_TTL_SECONDS and SESSION_TTL_MS, both derived from process.env.SESSION_TTL.
    • Added a CLEANUP_PROBABILITY constant to control the likelihood of probabilistic cleanup during session refreshes.
    • In refreshSession, added an EXPIRE command for the providerZSetKey with a fallback TTL of 3600 seconds.
    • Updated expire and setex calls for session binding keys (session:${sessionId}:provider, session:${sessionId}:key, session:${sessionId}:last_seen) to use the environment-driven ttlSeconds.
    • Implemented a probabilistic zremrangebyscore cleanup for providerZSetKey within refreshSession, triggered based on CLEANUP_PROBABILITY.
    • Modified cleanupExpiredSessions, countFromZSet, and getActiveSessions methods to use the new SessionTracker.SESSION_TTL_MS for accurate cutoff calculations.
  • tests/unit/lib/rate-limit/service-extra.test.ts
    • Added a new test case to verify that checkAndTrackProviderSession correctly passes SESSION_TTL_MS as ARGV[4] to the Lua script.
  • tests/unit/lib/session-tracker-cleanup.test.ts
    • Introduced a new test file dedicated to validating the SessionTracker's TTL and cleanup logic.
    • Added tests to confirm environment-driven TTLs are correctly applied for cutoff calculations.
    • Included tests for refreshSession's EXPIRE command on provider ZSETs and the use of environment-driven TTLs for session binding keys.
    • Verified the functionality of probabilistic cleanup on the write path, ensuring it triggers based on the defined probability.
    • Ensured fail-open behavior when Redis is not ready or is null, preventing application crashes.
Activity
  • The author, ding113, has implemented changes to align SessionTracker active session ZSETs with environment-defined TTLs.
  • The pull request includes a cleanup of the write-path by moving stale member cleanup to cleanupExpiredSessions and introducing probabilistic cleanup.
  • The Lua script rateLimitWithSessionLua has been updated to receive sessionTtlMs for proper expiration.
  • Local testing has been performed, with bun run test, bun run build, bun run typecheck, and bun run lint all passing.
  • This pull request addresses and closes issue Bug: SessionTracker ZSet 成员无限增长导致 Redis 内存泄漏 #714.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

引入基于环境变量的会话TTL(秒/毫秒),将该TTL传入限流Lua脚本以支持可配置清理窗口;扩展CostLimit接口添加name、resetTime、resetMode字段;在会话写入处加入概率性清理并补充针对TTL的单元测试覆盖。

Changes

Cohort / File(s) Summary
限流服务与接口变更
src/lib/rate-limit/service.ts
新增从环境读取的SESSION_TTL_*常量;扩展CostLimit接口(新增nameresetTime?resetMode?);将SESSION_TTL_MS作为第4个ARGV传递给CHECK_AND_TRACK_SESSION Lua脚本。
Lua脚本:TTL 支持
src/lib/redis/lua-scripts.ts
CHECK_AND_TRACK_SESSION新增对ARGV[4](ttlMs)的支持并以ttl窗口替换固定5分钟清理;增加ttl验证与cutoff计算;调整过期/EXPIRE逻辑与注释。
会话跟踪与清理逻辑
src/lib/session-tracker.ts
SESSION_TTL_SECONDS/SESSION_TTL_MS替代硬编码5分钟,新增CLEANUP_PROBABILITY;在trackSession/refreshSession等处使用provider ZSET键名、设置动态EXPIRE并在写入路径中按概率执行zremrangebyscore清理。
单元测试
tests/unit/lib/rate-limit/service-extra.test.ts, tests/unit/lib/session-tracker-cleanup.test.ts, tests/unit/lib/session-ttl-validation.test.ts
新增测试验证Lua脚本收到的ttlMs参数、SessionTracker的TTL解析/验证、概率性清理触发/不触发路径、以及在Redis不可用时的fail-open行为。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确地反映了PR的主要变更——通过环境TTL对SessionTracker active_sessions ZSETs进行约束。
Description check ✅ Passed 描述与变更集相关,详细说明了TTL对齐、ZSET过期处理和概率性清理等核心改动。
Linked Issues check ✅ Passed 代码变更全面满足#714中的关键需求:实现基于env的SESSION_TTL、添加概率性清理、传递ttlMs到Lua脚本、确保EXPIRE逻辑生效。
Out of Scope Changes check ✅ Passed 所有变更直接关联#714的目标——修复ZSet无限增长问题。Greptile指出BATCH_CHECK_SESSION_LIMITS脚本中的硬编码TTL,但该脚本目前未被使用,属于代码质量而非关键功能问题。
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/issue-714-sessiontracker-zset-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size/M Medium PR (< 500 lines) bug Something isn't working area:session labels Feb 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 3, 2026

Additional Comments (1)

src/lib/redis/lua-scripts.ts
hardcoded TTL inconsistent with CHECK_AND_TRACK_SESSION script which now uses env-driven TTL

local ttl = tonumber(ARGV[#ARGV - 1]) or 300000  -- Use second-to-last arg as ttl, fallback to 5 min
local now = tonumber(ARGV[#ARGV])
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/redis/lua-scripts.ts
Line: 78:78

Comment:
hardcoded TTL inconsistent with `CHECK_AND_TRACK_SESSION` script which now uses env-driven TTL

```suggestion
local ttl = tonumber(ARGV[#ARGV - 1]) or 300000  -- Use second-to-last arg as ttl, fallback to 5 min
local now = tonumber(ARGV[#ARGV])
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces environment-driven TTL configurations for session tracking, enhancing flexibility and consistency across SessionTracker and RateLimitService. However, a critical security vulnerability has been identified: several methods in the RateLimitService lack proper validation for numeric inputs, specifically failing to handle NaN or Infinity values. This oversight can lead to Lua script crashes in Redis, resulting in rate limit bypasses and persistent Denial of Service. It is recommended to use Number.isFinite() to validate all numeric parameters before they are used in calculations or passed to Redis. Additionally, I've noted two instances where a hardcoded fallback TTL could cause premature expiration of Redis keys if the session TTL is set to a large value, and a redundant cleanup mechanism that could be removed to simplify the code.

-- 5. Track session (ZADD updates timestamp for existing members)
redis.call('ZADD', provider_key, now, session_id)
redis.call('EXPIRE', provider_key, 3600) -- 1 小时兜底 TTL
redis.call('EXPIRE', provider_key, 3600) -- 1h fallback TTL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback TTL is hardcoded to 3600 seconds. If the session TTL (derived from ARGV[4]) is configured to be longer than 1 hour, the zset could expire prematurely, losing active session data. This fallback TTL should always be greater than the session TTL.

A dynamic fallback TTL should be calculated based on the session TTL to prevent this.

Suggested change
redis.call('EXPIRE', provider_key, 3600) -- 1h fallback TTL
local session_ttl_seconds = ttl / 1000
local fallback_ttl = math.max(3600, session_ttl_seconds * 2)
redis.call('EXPIRE', provider_key, fallback_ttl) -- Dynamic fallback TTL

pipeline.zadd(`key:${keyId}:active_sessions`, now, sessionId);
pipeline.zadd(`provider:${providerId}:active_sessions`, now, sessionId);
pipeline.zadd(providerZSetKey, now, sessionId);
pipeline.expire(providerZSetKey, 3600);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback TTL for the provider's active sessions zset is hardcoded to 3600 seconds. If SESSION_TTL is configured in the environment to a value larger than 3600, this zset could expire prematurely, leading to the loss of active session data. The fallback TTL should always be greater than the session TTL.

Consider deriving it from ttlSeconds to ensure it's always a safe value.

Suggested change
pipeline.expire(providerZSetKey, 3600);
pipeline.expire(providerZSetKey, Math.max(3600, ttlSeconds * 2));

Comment on lines +194 to +197
if (Math.random() < SessionTracker.CLEANUP_PROBABILITY) {
const cutoffMs = now - SessionTracker.SESSION_TTL_MS;
pipeline.zremrangebyscore(providerZSetKey, "-inf", cutoffMs);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This probabilistic cleanup of the providerZSetKey appears to be redundant. The same zset is already cleaned in other parts of the application:

  1. Deterministically within the CHECK_AND_TRACK_SESSION Lua script, which is executed on the hot path for providers with session limits.
  2. When getProviderSessionCount is called for statistics.

This redundancy adds complexity and a small performance overhead on the write path. To simplify the logic, you could remove this probabilistic cleanup and rely on the other two mechanisms.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/rate-limit/service.ts (1)

157-166: ⚠️ Potential issue | 🟠 Major

额度名称是用户可见文案,需走 i18n

name 会拼进返回的 reason,目前是硬编码中文,无法覆盖多语言。请改为 i18n key/翻译映射。
As per coding guidelines, All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text.

🤖 Fix all issues with AI agents
In `@src/lib/rate-limit/service.ts`:
- Around line 94-96: Validate the SESSION_TTL parsing and guard against
NaN/invalid values: when initializing SESSION_TTL_SECONDS (where
parseInt(process.env.SESSION_TTL || "300", 10) is used) check the parsed value
is a finite integer > 0 and fallback to the default 300 if not; then compute
SESSION_TTL_MS from that validated value so SESSION_TTL_MS is never NaN or <=0
before being used in Lua/Redis calls (update the constants SESSION_TTL_SECONDS
and SESSION_TTL_MS accordingly).

In `@src/lib/redis/lua-scripts.ts`:
- Around line 32-52: The EXPIRE call currently uses a hardcoded 3600s which can
cause premature key expiry when SESSION_TTL (ttl) > 1h; update the Lua script
that manages provider_key (where ttl, now, session_id, ZADD are used) to set the
key TTL based on ttl (converted to seconds) or at least use
math.max(ttl_seconds, 3600) before calling EXPIRE so the ZSET won't expire
earlier than the cleanup window; ensure the computed ttl_seconds comes from
tonumber(ARGV[4]) (ttl) and apply it in the redis.call('EXPIRE', provider_key,
...) instead of 3600.

In `@src/lib/session-tracker.ts`:
- Around line 179-186: The provider ZSET expiration is hardcoded to 3600s
causing premature deletion when SessionTracker.SESSION_TTL_SECONDS (ttlSeconds)
is larger; update the pipeline.expire call for providerZSetKey to use ttlSeconds
(or Math.max(3600, ttlSeconds) if you want a floor) so the providerZSetKey
lifetime is at least as long as the session TTL and concurrent counts won't
underreport; modify the expire on providerZSetKey in the block that references
providerZSetKey/pipeline.zadd to use SessionTracker.SESSION_TTL_SECONDS (or the
chosen max) instead of 3600.
- Around line 20-22: SESSION_TTL parsing can yield NaN if the env var is not
numeric; update the static initialization in SessionTracker to validate the
parsed value and fall back to the default 300s when invalid. Specifically,
replace the direct parseInt usage for SESSION_TTL_SECONDS with logic that reads
process.env.SESSION_TTL, attempts to parse an integer, checks
Number.isFinite(parsed) && parsed > 0 (or !Number.isNaN and positive), then
assigns that value (optionally Math.floor) or the default 300; recompute
SESSION_TTL_MS from the validated SESSION_TTL_SECONDS and leave
CLEANUP_PROBABILITY unchanged. Ensure the validation happens inside the
SessionTracker static field initialization so expire/setex never receives NaN.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR successfully aligns SessionTracker ZSET cleanup and session binding TTL refresh with the configurable SESSION_TTL environment variable. The implementation is clean, well-tested, and follows the project's coding standards.

PR Size: M

  • Lines changed: 395 (349 additions, 46 deletions)
  • Files changed: 5

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean (maintains fail-open pattern)
  • Type safety - Clean
  • Documentation accuracy - Clean (improved with English translations)
  • Test coverage - Excellent (279 new lines covering env-driven TTL, probabilistic cleanup, fail-open behavior)
  • Code clarity - Good

Notes

  1. Unused Lua Script: The BATCH_CHECK_SESSION_LIMITS script still contains hardcoded ttl = 300000 (lua-scripts.ts:77). However, this script is not invoked anywhere in the codebase and does not affect functionality. Consider updating it for consistency or removing if permanently unused.

  2. Probabilistic Cleanup: The 1% probabilistic cleanup on the write path (refreshSession) is a good balance between performance and memory management.

  3. Test Quality: The new test suite is comprehensive and correctly validates the env-driven TTL behavior across multiple code paths.


Automated review by Claude AI

export class SessionTracker {
private static readonly SESSION_TTL = 300000; // 5 分钟(毫秒)
private static readonly SESSION_TTL_SECONDS = parseInt(process.env.SESSION_TTL || "300", 10);
private static readonly SESSION_TTL_MS = SessionTracker.SESSION_TTL_SECONDS * 1000;
Copy link
Contributor

@github-actions github-actions bot Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] [TYPE-MISSING-VALIDATION] SESSION_TTL can break cleanup/counting when <= 0 or non-numeric

Why this is a problem: In src/lib/session-tracker.ts:20-21 the new TTL is parsed without validation:

private static readonly SESSION_TTL_SECONDS = parseInt(process.env.SESSION_TTL || "300", 10);

If SESSION_TTL is 0/negative, the new cutoff math (e.g. now - SessionTracker.SESSION_TTL_MS) makes ZREMRANGEBYSCORE remove all entries (cutoff >= now), effectively zeroing counts and bypassing concurrency limits. If it's non-numeric, ttlSeconds becomes NaN and gets passed into Redis EXPIRE/SETEX, which can throw and disable the refresh path.

Suggested fix:

private static readonly SESSION_TTL_SECONDS = (() => {
  const ttl = Number(process.env.SESSION_TTL);
  if (!Number.isFinite(ttl) || ttl <= 0) return 300;
  return Math.floor(ttl);
})();
private static readonly SESSION_TTL_MS = SessionTracker.SESSION_TTL_SECONDS * 1000;

normalizeResetTime,
} from "./time-utils";

const SESSION_TTL_SECONDS = parseInt(process.env.SESSION_TTL || "300", 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] [TYPE-MISSING-VALIDATION] Unvalidated SESSION_TTL can disable provider concurrency tracking

Why this is a problem: In src/lib/rate-limit/service.ts:94-95 the new TTL is parsed without guarding:

const SESSION_TTL_SECONDS = parseInt(process.env.SESSION_TTL || "300", 10);

That value is converted and passed into the Lua script as ARGV[4]. If SESSION_TTL is 0/negative, ttlMs becomes 0/negative and the Lua cleanup step (cutoff = now - ttl) will remove all entries (cutoff >= now), making the provider concurrency limit ineffective.

Suggested fix:

const SESSION_TTL_SECONDS = (() => {
  const ttl = Number(process.env.SESSION_TTL);
  if (!Number.isFinite(ttl) || ttl <= 0) return 300;
  return Math.floor(ttl);
})();
const SESSION_TTL_MS = SESSION_TTL_SECONDS * 1000;

local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local ttl = 300000 -- 5 分钟(毫秒)
local ttl = tonumber(ARGV[4]) or 300000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CRITICAL] [LOGIC-BUG] Lua ttlMs needs clamping to avoid clearing the ZSET (limit bypass)

Why this is a problem: In src/lib/redis/lua-scripts.ts:32-36 the script trusts ARGV[4]:

local ttl = tonumber(ARGV[4]) or 300000

When ttl is 0/negative, cutoff = now - ttl becomes >= now, and ZREMRANGEBYSCORE ... -inf cutoff can remove all existing sessions, making the concurrency check under-count and effectively bypass limits.

Suggested fix:

local ttl = tonumber(ARGV[4]) or 300000
if ttl <= 0 then
  ttl = 300000
end

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR propagates SESSION_TTL into provider active_sessions ZSET cleanup (TS + Lua) and adds unit tests around the new behavior. The main risk is that SESSION_TTL is not validated/clamped; values <= 0 or non-numeric can clear the ZSET and effectively bypass provider concurrency limiting.

PR Size: M

  • Lines changed: 395
  • Files changed: 5

Issues Found

Category Critical High Medium Low
Logic/Bugs 1 0 0 0
Security 1 0 0 0
Error Handling 0 0 0 0
Types 1 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Critical Issues (Must Fix)

  • (Confidence 97) SESSION_TTL is used as a cleanup window/TTL without clamping; <= 0 or non-numeric values can clear active_sessions and bypass concurrency limits (see inline comments on src/lib/session-tracker.ts, src/lib/rate-limit/service.ts, src/lib/redis/lua-scripts.ts).

High Priority Issues (Should Fix)

  • None

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Codex AI

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Identified PR #718 (“fix: bound SessionTracker active_sessions zsets by env TTL”) and applied label size/M.
  • Left 3 inline review comments (all Critical) pointing out that SESSION_TTL is used without clamping/validation, so <= 0 / non-numeric values can clear active_sessions and bypass provider concurrency limiting:
    • src/lib/session-tracker.ts
    • src/lib/rate-limit/service.ts
    • src/lib/redis/lua-scripts.ts
  • Submitted the required PR review summary via gh pr review --comment.

…id values

- Add input validation for SESSION_TTL (reject NaN, 0, negative; default 300)
- Guard against invalid TTL in Lua script to prevent clearing all sessions
- Use dynamic EXPIRE based on SESSION_TTL instead of hardcoded 3600s
- Add unit tests for TTL validation and dynamic expiry behavior
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

@ding113 ding113 merged commit bfc403a into dev Feb 6, 2026
11 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 6, 2026
@github-actions github-actions bot mentioned this pull request Feb 7, 2026
10 tasks
ding113 added a commit that referenced this pull request Feb 7, 2026
* fix(proxy): add 'cannot be modified' error detection to thinking signature rectifier

Extend the thinking signature rectifier to detect and handle the
Anthropic API error when thinking/redacted_thinking blocks have been
modified from their original response. This error occurs when clients
inadvertently modify these blocks in multi-turn conversations.

The rectifier will now remove these blocks and retry the request,
similar to how it handles other thinking-related signature errors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore(deps): bump jspdf in the npm_and_yarn group across 1 directory

Bumps the npm_and_yarn group with 1 update in the / directory: [jspdf](https://github.com/parallax/jsPDF).


Updates `jspdf` from 3.0.4 to 4.1.0
- [Release notes](https://github.com/parallax/jsPDF/releases)
- [Changelog](https://github.com/parallax/jsPDF/blob/master/RELEASE.md)
- [Commits](parallax/jsPDF@v3.0.4...v4.1.0)

---
updated-dependencies:
- dependency-name: jspdf
  dependency-version: 4.1.0
  dependency-type: direct:production
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: Hot-reload cache invalidation for Request Filters and Sensitive Words (#710)

* fix: hot-reload request filters via globalThis singleton pattern

EventEmitter and RequestFilterEngine now use globalThis caching to ensure
the same instance is shared across different Next.js worker contexts.
This fixes the issue where filter changes required Docker restart.

Added diagnostic logging for event subscription and propagation.

* fix(redis): wait for subscriber connection ready before subscribe

- ensureSubscriber now returns Promise<Redis>, waits for 'ready' event
- subscribeCacheInvalidation returns null on failure instead of noop
- RequestFilterEngine checks cleanup !== null before logging success
- Fixes false "Subscribed" log when Redis connection fails

* feat(sensitive-words): add hot-reload via Redis pub/sub

Enable real-time cache invalidation for sensitive words detector,
matching the pattern used by request-filter-engine and error-rule-detector.

* fix(redis): harden cache invalidation subscriptions

Ensure sensitive-words CRUD emits update events so hot-reload propagates across workers. Roll back failed pub/sub subscriptions, add retry/timeout coverage, and avoid sticky provider-cache subscription state.

* fix(codex): bump default User-Agent fallback

Update the hardcoded Codex UA used when requests lack an effective user-agent (e.g. filtered out). Keep unit tests in sync with the new default.

* fix(redis): resubscribe cache invalidation after reconnect

Clear cached subscription state on disconnect and resubscribe on ready so cross-worker cache invalidation survives transient Redis reconnects. Add unit coverage, avoid misleading publish logs, track detector cleanup handlers, and translate leftover Russian comments to English.

* fix(sensitive-words): use globalThis singleton detector

Align SensitiveWordDetector with existing __CCH_* singleton pattern to avoid duplicate instances across module reloads. Extend singleton unit tests to cover the detector.

* chore: format code (req-fix-dda97fd)

* fix: address PR review comments

- pubsub.ts: use `once` instead of `on` for ready event to prevent
  duplicate resubscription handlers on reconnect
- forwarder.ts: extract DEFAULT_CODEX_USER_AGENT constant
- provider-cache.ts: wrap subscribeCacheInvalidation in try/catch
- tests: use exported constant instead of hardcoded UA string

* fix(redis): resubscribe across repeated reconnects

Ensure pub/sub resubscribe runs on every reconnect, extend unit coverage, and keep emitRequestFiltersUpdated resilient when logger import fails.

---------

Co-authored-by: John Doe <johndoe@example.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* feat(logs): make cost column toggleable with improved type safety (#715)

close #713

* fix(proxy): add OpenAI chat completion format support in usage extraction (#705) (#716)

The `extractUsageMetrics` function was missing support for OpenAI chat
completion format fields (`prompt_tokens`/`completion_tokens`), causing
token statistics to not be recorded for OpenAI-compatible providers.

Changes:
- Add `prompt_tokens` -> `input_tokens` mapping
- Add `completion_tokens` -> `output_tokens` mapping
- Preserve priority: Claude > Gemini > OpenAI format
- Add 5 unit tests for OpenAI format handling

Closes #705

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(currency): respect system currencyDisplay setting in UI (#717)

Fixes #678 - Currency display unit configuration was not applied.

Root cause:
- `users-page-client.tsx` hardcoded `currencyCode="USD"`
- `UserLimitBadge` and `LimitStatusIndicator` had hardcoded `unit="$"` default
- `big-screen/page.tsx` used hardcoded "$" in multiple places

Changes:
- Add `getCurrencySymbol()` helper function to currency.ts
- Fetch system settings in `users-page-client.tsx` and pass to table
- Pass `currencySymbol` from `user-key-table-row.tsx` to limit badges
- Remove hardcoded "$" defaults from badge components
- Update big-screen page to fetch settings and use dynamic symbol
- Add unit tests for `getCurrencySymbol`

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat(gemini): add Google Search web access preference for Gemini providers (#721)

* feat(gemini): add Google Search web access preference for Gemini providers

Add provider-level preference for Gemini API type providers to control
Google Search (web access) tool injection:

- inherit: Follow client request (default)
- enabled: Force inject googleSearch tool into request
- disabled: Force remove googleSearch tool from request

Changes:
- Add geminiGoogleSearchPreference field to provider schema
- Add GeminiGoogleSearchPreference type and validation
- Implement applyGeminiGoogleSearchOverride with audit trail
- Add UI controls in provider form (Gemini Overrides section)
- Add i18n translations for 5 languages (en, zh-CN, zh-TW, ja, ru)
- Integrate override logic in proxy forwarder for Gemini requests
- Add 22 unit tests for the override logic

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(gemini): address code review feedback

- Use explicit else-if for disabled preference check (gemini-code-assist)
- Use i18n for SelectValue placeholder instead of hardcoded string (coderabbitai)
- Sync overridden body back to session.request.message for log consistency (coderabbitai)
- Persist Gemini special settings immediately, matching Anthropic pattern (coderabbitai)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(gemini): use strict types for provider config and audit

- Narrow preference type to "enabled" | "disabled" (exclude unreachable "inherit")
- Use ProviderType and GeminiGoogleSearchPreference types instead of string

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix(api): 透传 /api/actions 认证会话以避免 getUsers 返回空数据 (#720)

* fix(api): 透传 /api/actions 认证会话以避免 getUsers 返回空数据

* fix(auth): 让 scoped session 继承 allowReadOnlyAccess 语义并支持内部降权校验

* chore: format code (dev-93585fa)

* fix: bound SessionTracker active_sessions zsets by env TTL (#718)

* fix(session): bound active_sessions zsets by env ttl

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(rate-limit): pass session ttl to lua cleanup

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(session): validate SESSION_TTL env and prevent ZSET leak on invalid values

- Add input validation for SESSION_TTL (reject NaN, 0, negative; default 300)
- Guard against invalid TTL in Lua script to prevent clearing all sessions
- Use dynamic EXPIRE based on SESSION_TTL instead of hardcoded 3600s
- Add unit tests for TTL validation and dynamic expiry behavior

---------

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(provider): stop standard-path fallback to legacy provider url

* feat(providers): expose vendor endpoint pools in settings UI (#719)

* feat(providers): add endpoint status mapping

* feat(providers): add endpoint pool hover

* feat(providers): show vendor endpoints in list rows

* feat(providers): extract vendor endpoint CRUD table

* chore(i18n): add provider endpoint UI strings

* fix(providers): integrate endpoint pool into provider form

* fix(provider): wrap endpoint sync in transactions to prevent race conditions (#730)

* fix(provider): wrap provider create/update endpoint sync in transactions

Provider create and update operations now run vendor resolution and
endpoint sync inside database transactions to prevent race conditions
that could leave orphaned or inconsistent endpoint rows.

Key changes:
- createProvider: wrap vendor + insert + endpoint seed in a single tx
- updateProvider: wrap vendor + update + endpoint sync in a single tx
- Add syncProviderEndpointOnProviderEdit for atomic URL/type/vendor
  migration with in-place update, soft-delete, and conflict handling
- Vendor cleanup failures degrade to warnings instead of propagating
- Add comprehensive unit and integration tests for sync edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(provider): defer endpoint circuit reset until transaction commit

Avoid running endpoint circuit reset side effects inside DB transactions to prevent rollback inconsistency. Run resets only after commit and add regression tests for deferred reset behavior in helper and provider update flows.

* fix(provider): distinguish noop from created-next in endpoint sync action label

When ensureNextEndpointActive() returns "noop" (concurrent transaction
already created an active next endpoint), the action was incorrectly
labelled "kept-previous-and-created-next". Add a new
"kept-previous-and-kept-next" action to ProviderEndpointSyncAction and
use a three-way branch so callers and logs reflect the true outcome.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments from PR #731

- fix(auth): prevent scoped session access widening via ?? -> && guard
- fix(i18n): standardize zh-CN provider terminology to "服务商"
- fix(i18n): use consistent Russian translations for circuit status
- fix(i18n): replace raw formatDistanceToNow with locale-aware RelativeTime
- fix(gemini): log warning for unknown google search preference values
- fix(error-rules): check subscribeCacheInvalidation return value
- fix(test): correct endpoint hover sort test to assert URLs not labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: export auth session storage and fix test mock types

- Export authSessionStorage from auth-session-storage.node.ts to prevent
  undefined on named imports; remove duplicate declare global block
- Fix mockEndpoints in provider-endpoint-hover test: remove nonexistent
  lastOk/lastLatencyMs fields, add missing lastProbe* fields, use Date
  objects for createdAt/updatedAt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: miraserver <20286838+miraserver@users.noreply.github.com>
Co-authored-by: John Doe <johndoe@example.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: 泠音 <im@ling.plus>
Co-authored-by: Longlone <41830147+WAY29@users.noreply.github.com>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:session bug Something isn't working size/M Medium PR (< 500 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant