Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Aug 22, 2025

Summary by CodeRabbit

  • Chores
    • Updated internal benchmarking tooling to accept an additional model parameter; endpoint settings unchanged.
    • No changes to user-facing features, APIs, or settings.
    • No configuration or migration required.
    • Existing workflows and results reporting remain unaffected.
    • Backward compatibility preserved for production paths; only profiling routines touched.
    • Release includes no performance, security, or UI modifications.

@github-actions github-actions bot added the fix label Aug 22, 2025
@tedzhouhk tedzhouhk enabled auto-merge (squash) August 22, 2025 22:24
auto-merge was automatically disabled August 22, 2025 22:25

Pull request was closed

@jasonqinzhou jasonqinzhou reopened this Aug 22, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

Updated function call sites in benchmarks/profiler/profile_sla.py to pass an additional positional model_name argument to benchmark_prefill and benchmark_decode, duplicating model_name before the base_url keyword argument. No other logic, control flow, or public interfaces were changed.

Changes

Cohort / File(s) Summary
Profiler benchmarking call-site updates
benchmarks/profiler/profile_sla.py
Added a second positional model_name argument in calls to benchmark_prefill and benchmark_decode; base_url remains a keyword argument. No other code changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

I tap my paws on timing logs,
Two names hop where one once jogs.
Prefill, decode—now side by side,
URLs trail the model tide.
In burrows deep the benchmarks gleam,
A doubled name, a quicker stream. 🐇⌛️

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
benchmarks/profiler/profile_sla.py (7)

160-166: Use tokenizer_name instead of duplicating model_name (derive once from config).

Passing model_name twice risks incorrect tokenization when model and tokenizer diverge. Prefer a real tokenizer_name with a safe fallback to model_name.

Apply this diff to the call:

 gap_result = benchmark_prefill(
     args.isl,
     genai_perf_artifact_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url=base_url,
 )

Add this near where model_name is set (Lines 83-85) to derive tokenizer_name once:

# Derive tokenizer_name with graceful fallback
tokenizer_name = (
    getattr(config_modifier, "get_tokenizer_name", lambda cfg: None)(config)
    or model_name
)

284-292: Decode path: pass tokenizer_name (not a duplicate model_name).

Mirror the prefill fix so both benchmarks use the correct tokenizer.

 gap_result = benchmark_decode(
     args.isl,
     args.osl,
     num_request,
     genai_perf_artifact_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url=base_url,
 )

426-434: Propagate tokenizer_name to profile_prefill/profile_decode for consistency.

These helpers already accept (model_name, tokenizer_name, ...). Use the derived tokenizer_name for correctness and consistency across flows.

Example adjustments:

 profile_prefill(
     work_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url,
     best_prefill_tp,
     args.max_context_length,
     args.prefill_interpolation_granularity,
 )

 profile_decode(
     work_dir,
     model_name,
-    model_name,
+    tokenizer_name,
     base_url,
     best_decode_tp,
     max_kv_tokens,
     args.max_context_length,
     args.decode_interpolation_granularity,
 )

Also applies to: 482-491


173-177: Prefer logger over print for consistency and structured logs.

Everything else uses logger; replace the remaining print calls.

-            print("Cleaning up deployment...")
+            logger.info("Cleaning up deployment...")
             await client.delete_deployment()
             deployment_clients.remove(client)
-            print("Deployment deleted")
+            logger.info("Deployment deleted")

Repeat the same replacement for the other three cleanup blocks in the decode and interpolation sections.

Also applies to: 306-310, 436-440, 493-497


102-128: Avoid mutating shared config across TP iterations; copy per-iteration.

set_config_tp_size mutates prefill_config/decode_config in-place. If any other knobs get toggled inside the loop later, side-effects will leak across iterations. Safer to keep a base config and copy inside the loop.

Example:

import copy

base_prefill_config = config_modifier.convert_config(config, "prefill")
...
for tp_size in profile_tp_size:
    prefill_config = copy.deepcopy(base_prefill_config)
    prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size)

Do the same for decode_config.

Also applies to: 238-247


367-376: Clarify KV utilization formula and intent (parentheses/weights).

Currently: concurrency * (args.isl + args.osl / 2) / kv. If the intent is to weight OSL by 0.5, make it explicit; if not, you may be under/over-estimating.

Consider:

avg_decode_tokens = 0.5 * args.osl  # if this is the intended heuristic
selected_decode_kv_cache_utilization = (
    decode_concurrency[selected_decode_idx]
    * (args.isl + avg_decode_tokens)
    / decode_kv_cache_size[selected_decode_idx]
)

Note: Earlier, an “existing logs” estimate used concurrency * (ISL + OSL) * 2, which is not consistent with the selection formula. Please confirm the preferred heuristic and align both places.


316-365: Guard against empty decode results before selection.

If sweep_num_request ends up empty for all TP sizes (e.g., tiny KV cache), decode_itl stays empty and min(decode_itl) will raise. Add a defensive check and a helpful message.

Example:

if not decode_itl:
    logger.warning("No decode results gathered; cannot derive best TP. Check KV cache or sweep range.")
    return
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0bd4995 and 4154afc.

📒 Files selected for processing (1)
  • benchmarks/profiler/profile_sla.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
benchmarks/profiler/profile_sla.py (2)

52-70: PR title/scope mismatch — confirm intent.

PR title mentions “sla_planner.py”, but this change touches benchmarks/profiler/profile_sla.py. If that’s intentional, consider updating the title or ensure sla_planner.py fixes are included.


151-166: Please verify updated benchmark_prefill/benchmark_decode signatures across the repo

After adding the new tokenizer argument to the benchmark_prefill and benchmark_decode functions in
benchmarks/profiler/utils/genai_perf.py, please ensure that:

  • The function definitions include a default value for tokenizer (e.g., tokenizer=None) so existing call sites remain valid, OR
  • Every invocation of these functions now explicitly passes the appropriate tokenizer parameter.

Key call sites to check and update if necessary:

  • benchmarks/profiler/profile_sla.py

    • Lines 158–166: benchmark_prefill(args.isl, genai_perf_artifact_dir, model_name, model_name, base_url=base_url)
    • Lines 282–292: benchmark_decode(args.isl, args.osl, genai_perf_artifact_dir, …)
  • benchmarks/profiler/utils/profile_prefill.py (lines 38–42)

    • gap_result = benchmark_prefill(isl, genai_perf_artifact_dir, …)
  • benchmarks/profiler/utils/profile_decode.py (lines 64–68)

    • gap_result = benchmark_decode(isl, osl, genai_perf_artifact_dir, …)
  • benchmarks/profiler/profile_endpoint.py (lines 88–101)

    • profile_prefill(…) and profile_decode(…) wrappers that may invoke the updated benchmarks.

If the new parameter has a default, no further changes may be needed; otherwise, update each call site to pass the tokenizer argument.

@tedzhouhk tedzhouhk enabled auto-merge (squash) August 22, 2025 22:30
@tedzhouhk tedzhouhk merged commit 268d017 into main Aug 22, 2025
19 of 21 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/sla-planner-more-logs branch August 22, 2025 22:51
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants