Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Sep 4, 2025

  • update config processing due to operator update
  • add a dryrun mode to support testing pre-deployment without GPU
  • add pre-merge test

Summary by CodeRabbit

  • New Features
    • Added a --dry-run mode to the profiler CLI to simulate runs without deploying or collecting logs.
  • Breaking Changes
    • Removed the --example-dir option from the profiler CLI.
  • Bug Fixes
    • Profiling no longer aborts on deployment readiness timeouts.
    • More resilient config handling with optional fields and support for extra keys.
    • Safer port detection with sensible defaults.
    • Improved KV cache size parsing from logs.
  • Tests
    • Added dry-run coverage for multiple backends.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Introduces a dry-run mode to the profiler CLI, adjusts control flow to skip deployments/logs on dry-run or readiness timeouts, and refactors imports to benchmarks.profiler.utils.*. Config models are relaxed to accept optional fields and extras, utilities harden parsing/ports, and new dry-run tests are added.

Changes

Cohort / File(s) Change summary
Profiler CLI and flow control
benchmarks/profiler/profile_sla.py
Adds --dry-run, removes --example-dir, refactors imports, gates deployment/profiling on dry-run and readiness, adds TimeoutError handling, conditional log fetching, ensures cleanup, and adjusts TP selection/interpolation defaults for dry-run.
Config models and utilities
benchmarks/profiler/utils/config.py
Makes several fields Optional with defaults, allows extra fields via model_config, updates break_arguments to accept Optional, hardens port extraction with defaults, safeguards resource mutation, updates KV cache parsing, and removes worker arg rewriting.
Profiling utilities (imports only)
benchmarks/profiler/utils/profile_decode.py, benchmarks/profiler/utils/profile_prefill.py
Updates import paths to benchmarks.profiler.utils.*; functional behavior unchanged.
Test scaffolding
tests/profiler/__init__.py
Adds module with license header and docstring; no executable tests.
Dry-run tests
tests/profiler/test_profile_sla_dryrun.py
Adds async tests validating dry-run for vllm and sglang with disagg.yaml configs; constructs Args-like fixtures and calls run_profile(..., dry_run=True); marked for pre-merge.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as profile_sla CLI
  participant Planner as TP Planner
  participant Deployer as Deployment Manager
  participant Prof as Profiler
  participant Logs as Log Collector

  User->>CLI: Run with args (possibly --dry-run)
  CLI->>Planner: Compute TP candidates and bounds
  alt Dry-run enabled
    Note over CLI,Planner: Dry-run: select min TP indices
    CLI-->>Deployer: Skip deployment
    CLI-->>Prof: Skip profiling
    CLI-->>Logs: Skip log fetch
    CLI->>User: Output structure with interpolations (no deploy)
  else Real run
    CLI->>Deployer: Create deployment
    Deployer-->>CLI: Deployment ID
    CLI->>Deployer: Wait for readiness
    alt Ready
      CLI->>Prof: Run prefill/decode profiling
      Prof-->>CLI: Results
      CLI->>Logs: Fetch deployment logs
      Logs-->>CLI: Logs
    else Timeout
      Note over CLI,Deployer: On TimeoutError: skip profiling/logs
    end
    CLI-->>Deployer: Cleanup (finally)
    Deployer-->>CLI: Cleanup done
    CLI->>User: Emit results (or skipped state)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In dry-run fields I softly hop,
Skipping pods and logs that pop.
TP seeds are small, yet bright,
Bounds still whisper through the night.
Tests now nudge, “you’re good to go!”
Profiler hums—no GPUs in tow. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
benchmarks/profiler/utils/profile_prefill.py (2)

34-38: Guard against zero step in range() to avoid ValueError.
When interpolation_granularity >= (max_context_length-100), step becomes 0.

Apply:

-    for isl in range(
-        100,
-        max_context_length,
-        (max_context_length - 100) // interpolation_granularity,
-    ):
+    step = max(1, (max_context_length - 100) // max(1, interpolation_granularity))
+    for isl in range(100, max_context_length, step):

48-53: Avoid divide-by-zero and invalid throughput when ttft or num_gpus is non-positive.

-        if gap_result is not None:
-            ttft = gap_result["time_to_first_token"]["avg"]
-            prefill_isl.append(isl)
-            prefill_ttft.append(ttft)
-            prefill_thpt_per_gpu.append(isl / ttft / num_gpus * 1000)
+        if gap_result is not None:
+            ttft = gap_result["time_to_first_token"]["avg"]
+            if ttft <= 0 or num_gpus <= 0:
+                logger.warning(f"Skipping ISL {isl} due to non-positive ttft ({ttft}) or num_gpus ({num_gpus}).")
+                continue
+            prefill_isl.append(isl)
+            prefill_ttft.append(ttft)
+            prefill_thpt_per_gpu.append(isl / ttft / num_gpus * 1000)
benchmarks/profiler/utils/config.py (4)

158-176: VLLM prefill→decode conversion: avoid AttributeError and fragile remove().

  • Use the safe container accessor.
  • Only remove flags if present.
-            args = cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args
+            svc = cfg.spec.services[WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name]
+            container = get_or_create_main_container(svc)
+            args = container.args
             args = break_arguments(args)
-            # remove --is-prefill-worker flag
-            args.remove("--is-prefill-worker")
+            # remove --is-prefill-worker flag
+            if "--is-prefill-worker" in args:
+                args.remove("--is-prefill-worker")
             # disable prefix caching
             if "--enable-prefix-caching" in args:
                 args.remove("--enable-prefix-caching")
             if "--no-enable-prefix-caching" not in args:
                 args = append_argument(args, "--no-enable-prefix-caching")
-            cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args = join_arguments(args)
+            container.args = join_arguments(args)

183-198: VLLM decode path: use safe accessor; keep flags consistent.

-            args = cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args
+            svc = cfg.spec.services[WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name]
+            container = get_or_create_main_container(svc)
+            args = container.args
             args = break_arguments(args)
             # enable prefix caching
             if "--enable-prefix-caching" not in args:
                 args = append_argument(args, "--enable-prefix-caching")
             if "--no-enable-prefix-caching" in args:
                 args.remove("--no-enable-prefix-caching")
-            cfg.spec.services[
-                WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name
-            ].extraPodSpec.mainContainer.args = join_arguments(args)
+            container.args = join_arguments(args)

311-314: Fix exception log referencing undefined variable ‘line’.
If open() fails or no matching line is seen, ‘line’ is undefined.

-            logger.warning(
-                f"Failed to parse KV cache size from line: {line}. Error: {e}"
-            )
+            logger.warning(
+                f"Failed to parse KV cache size from file {dynamo_log_fn}. Error: {e}"
+            )

208-246: Add missing extraPodSpec.mainContainer.args in disagg.yaml services
components/backends/vllm/deploy/disagg.yaml and components/backends/sglang/deploy/disagg.yaml currently lack extraPodSpec.mainContainer.args entries for the Frontend and worker services (script output: MISSING), which will trigger an AttributeError at runtime. Add an args: [] (or appropriate default arguments) under extraPodSpec.mainContainer for each of those services.

🧹 Nitpick comments (9)
benchmarks/profiler/utils/config.py (4)

484-490: KV cache parser for SGLang log looks reasonable.
Minor: consider compiling the regex once if this runs hot; otherwise fine.


61-64: Remove unused Services model or convert to a dict-root model.
Spec.services is already dict[str, Service]; this class is unused and misleading.

-class Services(BaseModel):
-    Frontend: Service
-    model_config = {"extra": "allow"}
+# Removed unused Services model; Spec.services already models a dict of Service.

383-390: Unify return types between modifiers.
VLLM.convert_config returns a dict via model_dump(); SGLang.convert_config returns the original ‘config’. Choose one for consistency.


249-257: Model name fallback is fine; optionally probe alt flags.
If you expect alt flags (e.g., “--model-name”), consider checking them before defaulting.

tests/profiler/test_profile_sla_dryrun.py (2)

28-31: Avoid sys.path mutation in tests.

Use package-relative imports or ensure the package is importable via PYTHONPATH/pytest configuration. sys.path hacking is brittle in CI.

Apply this diff to prefer repo-root-relative import without manual path injection (or configure pytest to add the root via addopts):

-import sys
-from pathlib import Path
+from pathlib import Path
@@
-# Add the project root to sys.path to enable imports
-project_root = Path(__file__).parent.parent.parent
-sys.path.insert(0, str(project_root))
+# Ensure tests run from repo root via pytest.ini or CI; avoid sys.path mutation.

42-61: Make tests robust to missing config files and isolate outputs per test.

  • Skip cleanly if disagg.yaml isn’t present in the checkout.
  • Write to a unique tmp directory to avoid cross-test collisions.

Apply this diff:

-    @pytest.fixture
-    def vllm_args(self):
+    @pytest.fixture
+    def vllm_args(self, tmp_path):
@@
-        class Args:
+        config_path = Path("components/backends/vllm/deploy/disagg.yaml")
+        if not config_path.exists():
+            pytest.skip(f"Missing config: {config_path}")
+
+        class Args:
             backend = "vllm"
-            config = "components/backends/vllm/deploy/disagg.yaml"
-            output_dir = "/tmp/test_profiling_results"
+            config = str(config_path)
+            output_dir = str(tmp_path / "profiling_results_vllm")
             namespace = "test-namespace"
@@
-            service_name = ""
+            service_name = None
             dry_run = True
@@
-    def sglang_args(self):
+    def sglang_args(self, tmp_path):
@@
-        class Args:
+        config_path = Path("components/backends/sglang/deploy/disagg.yaml")
+        if not config_path.exists():
+            pytest.skip(f"Missing config: {config_path}")
+
+        class Args:
             backend = "sglang"
-            config = "components/backends/sglang/deploy/disagg.yaml"
-            output_dir = "/tmp/test_profiling_results"
+            config = str(config_path)
+            output_dir = str(tmp_path / "profiling_results_sglang")
             namespace = "test-namespace"
@@
-            service_name = ""
+            service_name = None
             dry_run = True

Also applies to: 67-86

benchmarks/profiler/profile_sla.py (3)

94-96: Avoid reusing mutated configs across TP iterations.

set_config_tp_size may mutate; build a fresh converted config per TP to prevent carry-over between iterations.

Apply this diff:

-        prefill_config = config_modifier.convert_config(config, "prefill")
@@
-            prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size)
+            prefill_config = config_modifier.set_config_tp_size(
+                config_modifier.convert_config(config, "prefill"), tp_size
+            )
@@
-        decode_config = config_modifier.convert_config(config, "decode")
@@
-            decode_config = config_modifier.set_config_tp_size(decode_config, tp_size)
+            decode_config = config_modifier.set_config_tp_size(
+                config_modifier.convert_config(config, "decode"), tp_size
+            )

Also applies to: 118-121, 191-194, 233-241


45-53: Guard logger handler to prevent duplicates on import.

When imported in tests, multiple handlers can be attached.

Apply this diff:

-logger = logging.getLogger(__name__)
-logger.setLevel(logging.INFO)
-console_handler = logging.StreamHandler()
-console_handler.setLevel(logging.INFO)
-formatter = logging.Formatter(
-    "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
-)
-console_handler.setFormatter(formatter)
-logger.addHandler(console_handler)
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+if not logger.handlers:
+    console_handler = logging.StreamHandler()
+    console_handler.setLevel(logging.INFO)
+    formatter = logging.Formatter(
+        "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S"
+    )
+    console_handler.setFormatter(formatter)
+    logger.addHandler(console_handler)

61-71: Validate backend and GPU bounds early.

Raise a clear error if backend is unknown or min/max are invalid.

Example:

if args.backend not in CONFIG_MODIFIERS:
    raise ValueError(f"Unsupported backend: {args.backend}")
if args.min_num_gpus_per_engine < 1 or args.max_num_gpus_per_engine < args.min_num_gpus_per_engine:
    raise ValueError("Invalid GPU bounds: min must be >=1 and <= max")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eb317cd and 7832aa9.

📒 Files selected for processing (6)
  • benchmarks/profiler/profile_sla.py (6 hunks)
  • benchmarks/profiler/utils/config.py (7 hunks)
  • benchmarks/profiler/utils/profile_decode.py (1 hunks)
  • benchmarks/profiler/utils/profile_prefill.py (1 hunks)
  • tests/profiler/__init__.py (1 hunks)
  • tests/profiler/test_profile_sla_dryrun.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
benchmarks/profiler/utils/profile_prefill.py (2)
benchmarks/profiler/utils/genai_perf.py (1)
  • benchmark_prefill (154-186)
benchmarks/profiler/utils/plot.py (1)
  • plot_prefill_interpolation (103-160)
benchmarks/profiler/utils/profile_decode.py (2)
benchmarks/profiler/utils/genai_perf.py (1)
  • benchmark_decode (189-247)
benchmarks/profiler/utils/plot.py (1)
  • plot_decode_3d_surface (163-254)
benchmarks/profiler/profile_sla.py (7)
benchmarks/profiler/utils/genai_perf.py (2)
  • benchmark_decode (189-247)
  • benchmark_prefill (154-186)
benchmarks/profiler/utils/plot.py (2)
  • plot_decode_performance (74-100)
  • plot_prefill_performance (34-71)
benchmarks/profiler/utils/profile_cache.py (4)
  • check_decode_results_exist (56-88)
  • check_prefill_results_exist (26-53)
  • load_existing_decode_results (111-138)
  • load_existing_prefill_results (91-108)
benchmarks/profiler/utils/profile_decode.py (1)
  • profile_decode (22-102)
benchmarks/profiler/utils/profile_prefill.py (1)
  • profile_prefill (22-80)
deploy/utils/dynamo_deployment.py (6)
  • DynamoDeploymentClient (97-481)
  • create_deployment (219-271)
  • wait_for_deployment_ready (273-400)
  • get_deployment_logs (430-461)
  • get_service_url (211-217)
  • delete_deployment (463-481)
benchmarks/profiler/utils/config.py (2)
  • get_kv_cache_size_from_dynamo_log (295-315)
  • get_kv_cache_size_from_dynamo_log (479-491)
tests/profiler/test_profile_sla_dryrun.py (1)
benchmarks/profiler/profile_sla.py (1)
  • run_profile (56-534)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (7)
tests/profiler/__init__.py (1)

1-16: LGTM — license header and docstring only.
No runtime impact.

benchmarks/profiler/utils/profile_decode.py (1)

8-9: LGTM — import path refactor only.
No functional changes; consistent with utils relocation.

benchmarks/profiler/utils/config.py (5)

211-228: Nice hardening of resources mutation.
Creating resources/requests when absent prevents KeyError and keeps limits in sync.


266-293: Robust port extraction — good defaults and guards.
Covers missing Frontend/args cleanly and avoids crashes.


395-412: Mirrored resource hardening for SGLang — looks good.
Prevents missing dicts while updating GPU requests/limits.


449-476: SGLang port extraction hardening — good parity with VLLM.


80-90: break_arguments None-handling — nice improvement.
Prevents crashes when args are absent and skips None entries.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk merged commit f72dc01 into main Sep 4, 2025
11 of 12 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/pre-sweep-test branch September 4, 2025 22:22
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
hhzhang16 pushed a commit that referenced this pull request Sep 10, 2025
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
tedzhouhk added a commit that referenced this pull request Sep 10, 2025
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
saturley-hall pushed a commit that referenced this pull request Sep 15, 2025
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants