fix: missing tokenizer args in sla_planner.py #2667

tedzhouhk · 2025-08-22T22:23:32Z

Summary by CodeRabbit

Chores
- Updated internal benchmarking tooling to accept an additional model parameter; endpoint settings unchanged.
- No changes to user-facing features, APIs, or settings.
- No configuration or migration required.
- Existing workflows and results reporting remain unaffected.
- Backward compatibility preserved for production paths; only profiling routines touched.
- Release includes no performance, security, or UI modifications.

coderabbitai · 2025-08-22T22:28:40Z

Walkthrough

Updated function call sites in benchmarks/profiler/profile_sla.py to pass an additional positional model_name argument to benchmark_prefill and benchmark_decode, duplicating model_name before the base_url keyword argument. No other logic, control flow, or public interfaces were changed.

Changes

Cohort / File(s)	Summary
Profiler benchmarking call-site updates `benchmarks/profiler/profile_sla.py`	Added a second positional `model_name` argument in calls to `benchmark_prefill` and `benchmark_decode`; `base_url` remains a keyword argument. No other code changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: standalone profiling script for a given endpoint #2386 — Similar updates to prefill/decode benchmarking call signatures and propagation of model_name/base URL through profiling utilities.

Poem

I tap my paws on timing logs,
Two names hop where one once jogs.
Prefill, decode—now side by side,
URLs trail the model tide.
In burrows deep the benchmarks gleam,
A doubled name, a quicker stream. 🐇⌛️

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

benchmarks/profiler/profile_sla.py (7)
160-166: Use tokenizer_name instead of duplicating model_name (derive once from config).

Passing model_name twice risks incorrect tokenization when model and tokenizer diverge. Prefer a real tokenizer_name with a safe fallback to model_name.

Apply this diff to the call:
 gap_result = benchmark_prefill(
     args.isl,
     genai_perf_artifact_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url=base_url,
 )
Add this near where model_name is set (Lines 83-85) to derive tokenizer_name once:
# Derive tokenizer_name with graceful fallback
tokenizer_name = (
    getattr(config_modifier, "get_tokenizer_name", lambda cfg: None)(config)
    or model_name
)
284-292: Decode path: pass tokenizer_name (not a duplicate model_name).

Mirror the prefill fix so both benchmarks use the correct tokenizer.
 gap_result = benchmark_decode(
     args.isl,
     args.osl,
     num_request,
     genai_perf_artifact_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url=base_url,
 )
426-434: Propagate tokenizer_name to profile_prefill/profile_decode for consistency.

These helpers already accept (model_name, tokenizer_name, ...). Use the derived tokenizer_name for correctness and consistency across flows.

Example adjustments:
 profile_prefill(
     work_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url,
     best_prefill_tp,
     args.max_context_length,
     args.prefill_interpolation_granularity,
 )

 profile_decode(
     work_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url,
     best_decode_tp,
     max_kv_tokens,
     args.max_context_length,
     args.decode_interpolation_granularity,
 )
Also applies to: 482-491

173-177: Prefer logger over print for consistency and structured logs.

Everything else uses logger; replace the remaining print calls.
-            print("Cleaning up deployment...")
+            logger.info("Cleaning up deployment...")
             await client.delete_deployment()
             deployment_clients.remove(client)
-            print("Deployment deleted")
+            logger.info("Deployment deleted")
Repeat the same replacement for the other three cleanup blocks in the decode and interpolation sections.

Also applies to: 306-310, 436-440, 493-497

102-128: Avoid mutating shared config across TP iterations; copy per-iteration.

set_config_tp_size mutates prefill_config/decode_config in-place. If any other knobs get toggled inside the loop later, side-effects will leak across iterations. Safer to keep a base config and copy inside the loop.

Example:
import copy

base_prefill_config = config_modifier.convert_config(config, "prefill")
...
for tp_size in profile_tp_size:
    prefill_config = copy.deepcopy(base_prefill_config)
    prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size)
Do the same for decode_config.

Also applies to: 238-247

367-376: Clarify KV utilization formula and intent (parentheses/weights).

Currently: concurrency * (args.isl + args.osl / 2) / kv. If the intent is to weight OSL by 0.5, make it explicit; if not, you may be under/over-estimating.

Consider:
avg_decode_tokens = 0.5 * args.osl  # if this is the intended heuristic
selected_decode_kv_cache_utilization = (
    decode_concurrency[selected_decode_idx]
    * (args.isl + avg_decode_tokens)
    / decode_kv_cache_size[selected_decode_idx]
)
Note: Earlier, an “existing logs” estimate used concurrency * (ISL + OSL) * 2, which is not consistent with the selection formula. Please confirm the preferred heuristic and align both places.

316-365: Guard against empty decode results before selection.

If sweep_num_request ends up empty for all TP sizes (e.g., tiny KV cache), decode_itl stays empty and min(decode_itl) will raise. Add a defensive check and a helpful message.

Example:
if not decode_itl:
    logger.warning("No decode results gathered; cannot derive best TP. Check KV cache or sweep range.")
    return

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0bd4995 and 4154afc.

📒 Files selected for processing (1)

benchmarks/profiler/profile_sla.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

benchmarks/profiler/profile_sla.py (2)

52-70: PR title/scope mismatch — confirm intent.

PR title mentions “sla_planner.py”, but this change touches benchmarks/profiler/profile_sla.py. If that’s intentional, consider updating the title or ensure sla_planner.py fixes are included.

151-166: Please verify updated benchmark_prefill/benchmark_decode signatures across the repo

After adding the new tokenizer argument to the benchmark_prefill and benchmark_decode functions in
benchmarks/profiler/utils/genai_perf.py, please ensure that:

The function definitions include a default value for tokenizer (e.g., tokenizer=None) so existing call sites remain valid, OR

Every invocation of these functions now explicitly passes the appropriate tokenizer parameter.

Key call sites to check and update if necessary:

benchmarks/profiler/profile_sla.py

Lines 158–166: benchmark_prefill(args.isl, genai_perf_artifact_dir, model_name, model_name, base_url=base_url)

Lines 282–292: benchmark_decode(args.isl, args.osl, genai_perf_artifact_dir, …)

benchmarks/profiler/utils/profile_prefill.py (lines 38–42)

gap_result = benchmark_prefill(isl, genai_perf_artifact_dir, …)

benchmarks/profiler/utils/profile_decode.py (lines 64–68)

gap_result = benchmark_decode(isl, osl, genai_perf_artifact_dir, …)

benchmarks/profiler/profile_endpoint.py (lines 88–101)

profile_prefill(…) and profile_decode(…) wrappers that may invoke the updated benchmarks.

If the new parameter has a default, no further changes may be needed; otherwise, update each call site to pass the tokenizer argument.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: nnshah1 <neelays@nvidia.com>

tedzhouhk added 2 commits August 22, 2025 15:21

fix missing args

ed9107a

pc

4154afc

tedzhouhk requested review from hhzhang16 and jasonqinzhou as code owners August 22, 2025 22:23

pull-request-size bot added the size/XS label Aug 22, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 22:23 Inactive

github-actions bot added the fix label Aug 22, 2025

tedzhouhk enabled auto-merge (squash) August 22, 2025 22:24

jasonqinzhou closed this Aug 22, 2025

auto-merge was automatically disabled August 22, 2025 22:25
Pull request was closed

jasonqinzhou reopened this Aug 22, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 22:25 Inactive

jasonqinzhou approved these changes Aug 22, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 22:27 Inactive

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

tedzhouhk enabled auto-merge (squash) August 22, 2025 22:30

tedzhouhk merged commit 268d017 into main Aug 22, 2025
19 of 21 checks passed

tedzhouhk deleted the hzhou/sla-planner-more-logs branch August 22, 2025 22:51

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

fix: missing tokenizer args in sla_planner.py (#2667)

cf08dbe

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

nv-anants pushed a commit that referenced this pull request Aug 28, 2025

fix: missing tokenizer args in sla_planner.py (#2667)

4c29c9f

jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025

fix: missing tokenizer args in sla_planner.py (#2667)

5f72551

Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>

KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025

fix: missing tokenizer args in sla_planner.py (#2667)

c96ee42

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

fix: missing tokenizer args in sla_planner.py (#2667)

bee986c

Signed-off-by: nnshah1 <neelays@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: missing tokenizer args in sla_planner.py #2667

fix: missing tokenizer args in sla_planner.py #2667

Uh oh!

tedzhouhk commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 22, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: missing tokenizer args in sla_planner.py #2667

fix: missing tokenizer args in sla_planner.py #2667

Uh oh!

Conversation

tedzhouhk commented Aug 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 22, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tedzhouhk commented Aug 22, 2025 •

edited by coderabbitai bot

Loading