-
Notifications
You must be signed in to change notification settings - Fork 688
fix: missing tokenizer args in sla_planner.py #2667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdated function call sites in benchmarks/profiler/profile_sla.py to pass an additional positional model_name argument to benchmark_prefill and benchmark_decode, duplicating model_name before the base_url keyword argument. No other logic, control flow, or public interfaces were changed. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
benchmarks/profiler/profile_sla.py (7)
160-166: Use tokenizer_name instead of duplicating model_name (derive once from config).Passing
model_nametwice risks incorrect tokenization when model and tokenizer diverge. Prefer a realtokenizer_namewith a safe fallback tomodel_name.Apply this diff to the call:
gap_result = benchmark_prefill( args.isl, genai_perf_artifact_dir, model_name, - model_name, + tokenizer_name, base_url=base_url, )Add this near where
model_nameis set (Lines 83-85) to derivetokenizer_nameonce:# Derive tokenizer_name with graceful fallback tokenizer_name = ( getattr(config_modifier, "get_tokenizer_name", lambda cfg: None)(config) or model_name )
284-292: Decode path: pass tokenizer_name (not a duplicate model_name).Mirror the prefill fix so both benchmarks use the correct tokenizer.
gap_result = benchmark_decode( args.isl, args.osl, num_request, genai_perf_artifact_dir, model_name, - model_name, + tokenizer_name, base_url=base_url, )
426-434: Propagate tokenizer_name to profile_prefill/profile_decode for consistency.These helpers already accept
(model_name, tokenizer_name, ...). Use the derivedtokenizer_namefor correctness and consistency across flows.Example adjustments:
profile_prefill( work_dir, model_name, - model_name, + tokenizer_name, base_url, best_prefill_tp, args.max_context_length, args.prefill_interpolation_granularity, ) profile_decode( work_dir, model_name, - model_name, + tokenizer_name, base_url, best_decode_tp, max_kv_tokens, args.max_context_length, args.decode_interpolation_granularity, )Also applies to: 482-491
173-177: Prefer logger over print for consistency and structured logs.Everything else uses
logger; replace the remaining- print("Cleaning up deployment...") + logger.info("Cleaning up deployment...") await client.delete_deployment() deployment_clients.remove(client) - print("Deployment deleted") + logger.info("Deployment deleted")Repeat the same replacement for the other three cleanup blocks in the decode and interpolation sections.
Also applies to: 306-310, 436-440, 493-497
102-128: Avoid mutating shared config across TP iterations; copy per-iteration.
set_config_tp_sizemutatesprefill_config/decode_configin-place. If any other knobs get toggled inside the loop later, side-effects will leak across iterations. Safer to keep a base config and copy inside the loop.Example:
import copy base_prefill_config = config_modifier.convert_config(config, "prefill") ... for tp_size in profile_tp_size: prefill_config = copy.deepcopy(base_prefill_config) prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size)Do the same for
decode_config.Also applies to: 238-247
367-376: Clarify KV utilization formula and intent (parentheses/weights).Currently:
concurrency * (args.isl + args.osl / 2) / kv. If the intent is to weight OSL by 0.5, make it explicit; if not, you may be under/over-estimating.Consider:
avg_decode_tokens = 0.5 * args.osl # if this is the intended heuristic selected_decode_kv_cache_utilization = ( decode_concurrency[selected_decode_idx] * (args.isl + avg_decode_tokens) / decode_kv_cache_size[selected_decode_idx] )Note: Earlier, an “existing logs” estimate used
concurrency * (ISL + OSL) * 2, which is not consistent with the selection formula. Please confirm the preferred heuristic and align both places.
316-365: Guard against empty decode results before selection.If
sweep_num_requestends up empty for all TP sizes (e.g., tiny KV cache),decode_itlstays empty andmin(decode_itl)will raise. Add a defensive check and a helpful message.Example:
if not decode_itl: logger.warning("No decode results gathered; cannot derive best TP. Check KV cache or sweep range.") return
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
benchmarks/profiler/profile_sla.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
benchmarks/profiler/profile_sla.py (2)
52-70: PR title/scope mismatch — confirm intent.PR title mentions “sla_planner.py”, but this change touches
benchmarks/profiler/profile_sla.py. If that’s intentional, consider updating the title or ensuresla_planner.pyfixes are included.
151-166: Please verify updatedbenchmark_prefill/benchmark_decodesignatures across the repoAfter adding the new
tokenizerargument to thebenchmark_prefillandbenchmark_decodefunctions in
benchmarks/profiler/utils/genai_perf.py, please ensure that:
- The function definitions include a default value for
tokenizer(e.g.,tokenizer=None) so existing call sites remain valid, OR- Every invocation of these functions now explicitly passes the appropriate
tokenizerparameter.Key call sites to check and update if necessary:
benchmarks/profiler/profile_sla.py
- Lines 158–166:
benchmark_prefill(args.isl, genai_perf_artifact_dir, model_name, model_name, base_url=base_url)- Lines 282–292:
benchmark_decode(args.isl, args.osl, genai_perf_artifact_dir, …)benchmarks/profiler/utils/profile_prefill.py (lines 38–42)
gap_result = benchmark_prefill(isl, genai_perf_artifact_dir, …)benchmarks/profiler/utils/profile_decode.py (lines 64–68)
gap_result = benchmark_decode(isl, osl, genai_perf_artifact_dir, …)benchmarks/profiler/profile_endpoint.py (lines 88–101)
profile_prefill(…)andprofile_decode(…)wrappers that may invoke the updated benchmarks.If the new parameter has a default, no further changes may be needed; otherwise, update each call site to pass the
tokenizerargument.
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Summary by CodeRabbit