Skip to content

Conversation

@lilei199908
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings January 20, 2026 04:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new command-line arguments to configure load balancing thresholds for the SGLang router: --sglang-router-balance-abs-threshold (integer, default 10) and --sglang-router-balance-rel-threshold (float, default 1.2).

Changes:

  • Added two new router configuration arguments that control balance thresholds for the SGLang router

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 32 to 38
help="Absolute threshold for balance in the SGLang router",
)
parser.add_argument(
"--sglang-router-balance-rel-threshold",
type=float,
default=1.2,
help="Relative threshold for balance in the SGLang router",
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text for these arguments is vague and doesn't explain what "balance" means in the context of the SGLang router, or how the absolute and relative thresholds are used. Consider providing more descriptive help text that explains:

  • What aspect of the router is being balanced (e.g., load balancing across workers)
  • How the absolute threshold is applied and what units it uses
  • What the relative threshold represents (e.g., a ratio or percentage)

This will make it easier for users to understand how to configure these parameters appropriately.

Suggested change
help="Absolute threshold for balance in the SGLang router",
)
parser.add_argument(
"--sglang-router-balance-rel-threshold",
type=float,
default=1.2,
help="Relative threshold for balance in the SGLang router",
help=(
"Maximum allowed absolute difference in number of in-flight or queued "
"requests between the most and least loaded SGLang workers before the "
"router starts actively preferring less-loaded workers. "
"Expressed as a count of requests."
),
)
parser.add_argument(
"--sglang-router-balance-rel-threshold",
type=float,
default=1.2,
help=(
"Maximum allowed relative load imbalance between SGLang workers before "
"the router starts actively preferring less-loaded workers. "
"Interpreted as a ratio of highest load to lowest load "
"(e.g., 1.2 means the busiest worker may have up to 20% more requests "
"than the least busy one)."
),

Copilot uses AI. Check for mistakes.
@zhuzilin zhuzilin merged commit d68691e into THUDM:main Jan 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants