test: add test for pre-deployment script #2857

tedzhouhk · 2025-09-04T02:23:35Z

update config processing due to operator update
add a dryrun mode to support testing pre-deployment without GPU
add pre-merge test

Summary by CodeRabbit

New Features
- Added a --dry-run mode to the profiler CLI to simulate runs without deploying or collecting logs.
Breaking Changes
- Removed the --example-dir option from the profiler CLI.
Bug Fixes
- Profiling no longer aborts on deployment readiness timeouts.
- More resilient config handling with optional fields and support for extra keys.
- Safer port detection with sensible defaults.
- Improved KV cache size parsing from logs.
Tests
- Added dry-run coverage for multiple backends.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

coderabbitai · 2025-09-04T02:27:25Z

Walkthrough

Introduces a dry-run mode to the profiler CLI, adjusts control flow to skip deployments/logs on dry-run or readiness timeouts, and refactors imports to benchmarks.profiler.utils.*. Config models are relaxed to accept optional fields and extras, utilities harden parsing/ports, and new dry-run tests are added.

Changes

Cohort / File(s)	Change summary
Profiler CLI and flow control `benchmarks/profiler/profile_sla.py`	Adds `--dry-run`, removes `--example-dir`, refactors imports, gates deployment/profiling on dry-run and readiness, adds TimeoutError handling, conditional log fetching, ensures cleanup, and adjusts TP selection/interpolation defaults for dry-run.
Config models and utilities `benchmarks/profiler/utils/config.py`	Makes several fields Optional with defaults, allows extra fields via model_config, updates `break_arguments` to accept Optional, hardens port extraction with defaults, safeguards resource mutation, updates KV cache parsing, and removes worker arg rewriting.
Profiling utilities (imports only) `benchmarks/profiler/utils/profile_decode.py`, `benchmarks/profiler/utils/profile_prefill.py`	Updates import paths to `benchmarks.profiler.utils.*`; functional behavior unchanged.
Test scaffolding `tests/profiler/__init__.py`	Adds module with license header and docstring; no executable tests.
Dry-run tests `tests/profiler/test_profile_sla_dryrun.py`	Adds async tests validating dry-run for vllm and sglang with disagg.yaml configs; constructs Args-like fixtures and calls `run_profile(..., dry_run=True)`; marked for pre-merge.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as profile_sla CLI
  participant Planner as TP Planner
  participant Deployer as Deployment Manager
  participant Prof as Profiler
  participant Logs as Log Collector

  User->>CLI: Run with args (possibly --dry-run)
  CLI->>Planner: Compute TP candidates and bounds
  alt Dry-run enabled
    Note over CLI,Planner: Dry-run: select min TP indices
    CLI-->>Deployer: Skip deployment
    CLI-->>Prof: Skip profiling
    CLI-->>Logs: Skip log fetch
    CLI->>User: Output structure with interpolations (no deploy)
  else Real run
    CLI->>Deployer: Create deployment
    Deployer-->>CLI: Deployment ID
    CLI->>Deployer: Wait for readiness
    alt Ready
      CLI->>Prof: Run prefill/decode profiling
      Prof-->>CLI: Results
      CLI->>Logs: Fetch deployment logs
      Logs-->>CLI: Logs
    else Timeout
      Note over CLI,Deployer: On TimeoutError: skip profiling/logs
    end
    CLI-->>Deployer: Cleanup (finally)
    Deployer-->>CLI: Cleanup done
    CLI->>User: Emit results (or skipped state)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

refactor: break profile_sla into different files; feat: support vllm_v1 #1588 — Touches the same profiler entrypoint and config utils; earlier refactor of imports and wiring that this PR builds upon with dry-run and optional config changes.
feat: add benchmarking guide #2620 — Modifies benchmarks/profiler/profile_sla.py workflow/handling, overlapping with this PR’s CLI and control-flow edits.
feat: support SGLang in pre-deployment sweeping #2360 — Adjusts profiler config utilities (argument parsing, ports, KV cache) used directly by the updated profiler paths here.

Poem

In dry-run fields I softly hop,
Skipping pods and logs that pop.
TP seeds are small, yet bright,
Bounds still whisper through the night.
Tests now nudge, “you’re good to go!”
Profiler hums—no GPUs in tow. 🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

benchmarks/profiler/utils/profile_prefill.py (2)

34-38: Guard against zero step in range() to avoid ValueError.
When interpolation_granularity >= (max_context_length-100), step becomes 0.

Apply:

-    for isl in range(
-        100,
-        max_context_length,
-        (max_context_length - 100) // interpolation_granularity,
-    ):
+    step = max(1, (max_context_length - 100) // max(1, interpolation_granularity))
+    for isl in range(100, max_context_length, step):

48-53: Avoid divide-by-zero and invalid throughput when ttft or num_gpus is non-positive.

-        if gap_result is not None:
-            ttft = gap_result["time_to_first_token"]["avg"]
-            prefill_isl.append(isl)
-            prefill_ttft.append(ttft)
-            prefill_thpt_per_gpu.append(isl / ttft / num_gpus * 1000)
+        if gap_result is not None:
+            ttft = gap_result["time_to_first_token"]["avg"]
+            if ttft <= 0 or num_gpus <= 0:
+                logger.warning(f"Skipping ISL {isl} due to non-positive ttft ({ttft}) or num_gpus ({num_gpus}).")
+                continue
+            prefill_isl.append(isl)
+            prefill_ttft.append(ttft)
+            prefill_thpt_per_gpu.append(isl / ttft / num_gpus * 1000)

benchmarks/profiler/utils/config.py (4)

158-176: VLLM prefill→decode conversion: avoid AttributeError and fragile remove().

Use the safe container accessor.
Only remove flags if present.

-            args = cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args
+            svc = cfg.spec.services[WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name]
+            container = get_or_create_main_container(svc)
+            args = container.args
             args = break_arguments(args)
-            # remove --is-prefill-worker flag
-            args.remove("--is-prefill-worker")
+            # remove --is-prefill-worker flag
+            if "--is-prefill-worker" in args:
+                args.remove("--is-prefill-worker")
             # disable prefix caching
             if "--enable-prefix-caching" in args:
                 args.remove("--enable-prefix-caching")
             if "--no-enable-prefix-caching" not in args:
                 args = append_argument(args, "--no-enable-prefix-caching")
-            cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args = join_arguments(args)
+            container.args = join_arguments(args)

183-198: VLLM decode path: use safe accessor; keep flags consistent.

-            args = cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args
+            svc = cfg.spec.services[WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name]
+            container = get_or_create_main_container(svc)
+            args = container.args
             args = break_arguments(args)
             # enable prefix caching
             if "--enable-prefix-caching" not in args:
                 args = append_argument(args, "--enable-prefix-caching")
             if "--no-enable-prefix-caching" in args:
                 args.remove("--no-enable-prefix-caching")
-            cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args = join_arguments(args)
+            container.args = join_arguments(args)

311-314: Fix exception log referencing undefined variable ‘line’.
If open() fails or no matching line is seen, ‘line’ is undefined.

-            logger.warning(
-                f"Failed to parse KV cache size from line: {line}. Error: {e}"
-            )
+            logger.warning(
+                f"Failed to parse KV cache size from file {dynamo_log_fn}. Error: {e}"
+            )

208-246: Add missing extraPodSpec.mainContainer.args in disagg.yaml services
components/backends/vllm/deploy/disagg.yaml and components/backends/sglang/deploy/disagg.yaml currently lack extraPodSpec.mainContainer.args entries for the Frontend and worker services (script output: MISSING), which will trigger an AttributeError at runtime. Add an args: [] (or appropriate default arguments) under extraPodSpec.mainContainer for each of those services.

🧹 Nitpick comments (9)

benchmarks/profiler/utils/config.py (4)

484-490: KV cache parser for SGLang log looks reasonable.
Minor: consider compiling the regex once if this runs hot; otherwise fine.

61-64: Remove unused Services model or convert to a dict-root model.
Spec.services is already dict[str, Service]; this class is unused and misleading.
-class Services(BaseModel):
-    Frontend: Service
-    model_config = {"extra": "allow"}
+# Removed unused Services model; Spec.services already models a dict of Service.
383-390: Unify return types between modifiers.
VLLM.convert_config returns a dict via model_dump(); SGLang.convert_config returns the original ‘config’. Choose one for consistency.

249-257: Model name fallback is fine; optionally probe alt flags.
If you expect alt flags (e.g., “--model-name”), consider checking them before defaulting.

tests/profiler/test_profile_sla_dryrun.py (2)

28-31: Avoid sys.path mutation in tests.

Use package-relative imports or ensure the package is importable via PYTHONPATH/pytest configuration. sys.path hacking is brittle in CI.

Apply this diff to prefer repo-root-relative import without manual path injection (or configure pytest to add the root via addopts):

-import sys
-from pathlib import Path
+from pathlib import Path
@@
-# Add the project root to sys.path to enable imports
-project_root = Path(__file__).parent.parent.parent
-sys.path.insert(0, str(project_root))
+# Ensure tests run from repo root via pytest.ini or CI; avoid sys.path mutation.

42-61: Make tests robust to missing config files and isolate outputs per test.

Skip cleanly if disagg.yaml isn’t present in the checkout.
Write to a unique tmp directory to avoid cross-test collisions.

Apply this diff:

-    @pytest.fixture
-    def vllm_args(self):
+    @pytest.fixture
+    def vllm_args(self, tmp_path):
@@
-        class Args:
+        config_path = Path("components/backends/vllm/deploy/disagg.yaml")
+        if not config_path.exists():
+            pytest.skip(f"Missing config: {config_path}")
+
+        class Args:
             backend = "vllm"
-            config = "components/backends/vllm/deploy/disagg.yaml"
-            output_dir = "/tmp/test_profiling_results"
+            config = str(config_path)
+            output_dir = str(tmp_path / "profiling_results_vllm")
             namespace = "test-namespace"
@@
-            service_name = ""
+            service_name = None
             dry_run = True
@@
-    def sglang_args(self):
+    def sglang_args(self, tmp_path):
@@
-        class Args:
+        config_path = Path("components/backends/sglang/deploy/disagg.yaml")
+        if not config_path.exists():
+            pytest.skip(f"Missing config: {config_path}")
+
+        class Args:
             backend = "sglang"
-            config = "components/backends/sglang/deploy/disagg.yaml"
-            output_dir = "/tmp/test_profiling_results"
+            config = str(config_path)
+            output_dir = str(tmp_path / "profiling_results_sglang")
             namespace = "test-namespace"
@@
-            service_name = ""
+            service_name = None
             dry_run = True

Also applies to: 67-86

benchmarks/profiler/profile_sla.py (3)

94-96: Avoid reusing mutated configs across TP iterations.

set_config_tp_size may mutate; build a fresh converted config per TP to prevent carry-over between iterations.

Apply this diff:

-        prefill_config = config_modifier.convert_config(config, "prefill")
@@
-            prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size)
+            prefill_config = config_modifier.set_config_tp_size(
+                config_modifier.convert_config(config, "prefill"), tp_size
+            )
@@
-        decode_config = config_modifier.convert_config(config, "decode")
@@
-            decode_config = config_modifier.set_config_tp_size(decode_config, tp_size)
+            decode_config = config_modifier.set_config_tp_size(
+                config_modifier.convert_config(config, "decode"), tp_size
+            )

Also applies to: 118-121, 191-194, 233-241

45-53: Guard logger handler to prevent duplicates on import.

When imported in tests, multiple handlers can be attached.

Apply this diff:

-logger = logging.getLogger(__name__)
-logger.setLevel(logging.INFO)
-console_handler = logging.StreamHandler()
-console_handler.setLevel(logging.INFO)
-formatter = logging.Formatter(
-    "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
-)
-console_handler.setFormatter(formatter)
-logger.addHandler(console_handler)
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+if not logger.handlers:
+    console_handler = logging.StreamHandler()
+    console_handler.setLevel(logging.INFO)
+    formatter = logging.Formatter(
+        "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
+    )
+    console_handler.setFormatter(formatter)
+    logger.addHandler(console_handler)

61-71: Validate backend and GPU bounds early.

Raise a clear error if backend is unknown or min/max are invalid.

Example:

if args.backend not in CONFIG_MODIFIERS:
    raise ValueError(f"Unsupported backend: {args.backend}")
if args.min_num_gpus_per_engine < 1 or args.max_num_gpus_per_engine < args.min_num_gpus_per_engine:
    raise ValueError("Invalid GPU bounds: min must be >=1 and <= max")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eb317cd and 7832aa9.

📒 Files selected for processing (6)

benchmarks/profiler/profile_sla.py (6 hunks)
benchmarks/profiler/utils/config.py (7 hunks)
benchmarks/profiler/utils/profile_decode.py (1 hunks)
benchmarks/profiler/utils/profile_prefill.py (1 hunks)
tests/profiler/__init__.py (1 hunks)
tests/profiler/test_profile_sla_dryrun.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

benchmarks/profiler/utils/profile_prefill.py (2)

benchmarks/profiler/utils/genai_perf.py (1)

benchmark_prefill (154-186)

benchmarks/profiler/utils/plot.py (1)

plot_prefill_interpolation (103-160)

benchmarks/profiler/utils/profile_decode.py (2)

benchmarks/profiler/utils/genai_perf.py (1)

benchmark_decode (189-247)

benchmarks/profiler/utils/plot.py (1)

plot_decode_3d_surface (163-254)

benchmarks/profiler/profile_sla.py (7)

benchmarks/profiler/utils/genai_perf.py (2)

benchmark_decode (189-247)

benchmark_prefill (154-186)

benchmarks/profiler/utils/plot.py (2)

plot_decode_performance (74-100)

plot_prefill_performance (34-71)

benchmarks/profiler/utils/profile_cache.py (4)

check_decode_results_exist (56-88)

check_prefill_results_exist (26-53)

load_existing_decode_results (111-138)

load_existing_prefill_results (91-108)

benchmarks/profiler/utils/profile_decode.py (1)

profile_decode (22-102)

benchmarks/profiler/utils/profile_prefill.py (1)

profile_prefill (22-80)

deploy/utils/dynamo_deployment.py (6)

DynamoDeploymentClient (97-481)

create_deployment (219-271)

wait_for_deployment_ready (273-400)

get_deployment_logs (430-461)

get_service_url (211-217)

delete_deployment (463-481)

benchmarks/profiler/utils/config.py (2)

get_kv_cache_size_from_dynamo_log (295-315)

get_kv_cache_size_from_dynamo_log (479-491)

tests/profiler/test_profile_sla_dryrun.py (1)

benchmarks/profiler/profile_sla.py (1)

run_profile (56-534)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build and Test - dynamo
GitHub Check: Build and Test - vllm

🔇 Additional comments (7)

tests/profiler/__init__.py (1)

1-16: LGTM — license header and docstring only.
No runtime impact.

benchmarks/profiler/utils/profile_decode.py (1)

8-9: LGTM — import path refactor only.
No functional changes; consistent with utils relocation.

benchmarks/profiler/utils/config.py (5)

211-228: Nice hardening of resources mutation.
Creating resources/requests when absent prevents KeyError and keeps limits in sync.

266-293: Robust port extraction — good defaults and guards.
Covers missing Frontend/args cleanly and avoids crashes.

395-412: Mirrored resource hardening for SGLang — looks good.
Prevents missing dicts while updating GPU requests/limits.

449-476: SGLang port extraction hardening — good parity with VLLM.

80-90: break_arguments None-handling — nice improvement.
Prevents crashes when args are absent and skips None entries.

benchmarks/profiler/profile_sla.py

benchmarks/profiler/utils/config.py

benchmarks/profiler/utils/profile_prefill.py

tests/profiler/test_profile_sla_dryrun.py

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>

test: add test for pre-deployment script

807e8fa

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

tedzhouhk requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, hhzhang16, ishandhanani, jasonqinzhou, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25 and tmonty12 as code owners September 4, 2025 02:23

pull-request-size bot added the size/XL label Sep 4, 2025

github-actions bot added the test label Sep 4, 2025

pc

7832aa9

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB September 4, 2025 02:25 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 4, 2025 02:26 Inactive

coderabbitai bot reviewed Sep 4, 2025

View reviewed changes

PeaBrane reviewed Sep 4, 2025

View reviewed changes

tests/profiler/test_profile_sla_dryrun.py Show resolved Hide resolved

PeaBrane approved these changes Sep 4, 2025

View reviewed changes

mypy

1dd9700

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB September 4, 2025 16:08 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 4, 2025 16:09 Inactive

tedzhouhk merged commit f72dc01 into main Sep 4, 2025
11 of 12 checks passed

tedzhouhk deleted the hzhou/pre-sweep-test branch September 4, 2025 22:22

dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025

test: add test for pre-deployment script (#2857)

98800f6

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

test: add test for pre-deployment script (#2857)

b31cd36

Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

coderabbitai bot mentioned this pull request Sep 10, 2025

feat: support trtllm in sla-planner #2980

Merged

hhzhang16 pushed a commit that referenced this pull request Sep 10, 2025

test: add test for pre-deployment script (#2857)

ff9af4f

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

tedzhouhk added a commit that referenced this pull request Sep 10, 2025

test: add test for pre-deployment script (#2857)

51661a4

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

saturley-hall pushed a commit that referenced this pull request Sep 15, 2025

test: add test for pre-deployment script (#2857) (#2993)

1642965

Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>

coderabbitai bot mentioned this pull request Sep 17, 2025

feat: Add --use-ai-configurator to profile_sla.py #3079

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add test for pre-deployment script #2857

test: add test for pre-deployment script #2857

Uh oh!

tedzhouhk commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 4, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test: add test for pre-deployment script #2857

test: add test for pre-deployment script #2857

Uh oh!

Conversation

tedzhouhk commented Sep 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tedzhouhk commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 4, 2025 •

edited

Loading