-
Notifications
You must be signed in to change notification settings - Fork 690
test: add test for pre-deployment script #2857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
WalkthroughIntroduces a dry-run mode to the profiler CLI, adjusts control flow to skip deployments/logs on dry-run or readiness timeouts, and refactors imports to benchmarks.profiler.utils.*. Config models are relaxed to accept optional fields and extras, utilities harden parsing/ports, and new dry-run tests are added. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as profile_sla CLI
participant Planner as TP Planner
participant Deployer as Deployment Manager
participant Prof as Profiler
participant Logs as Log Collector
User->>CLI: Run with args (possibly --dry-run)
CLI->>Planner: Compute TP candidates and bounds
alt Dry-run enabled
Note over CLI,Planner: Dry-run: select min TP indices
CLI-->>Deployer: Skip deployment
CLI-->>Prof: Skip profiling
CLI-->>Logs: Skip log fetch
CLI->>User: Output structure with interpolations (no deploy)
else Real run
CLI->>Deployer: Create deployment
Deployer-->>CLI: Deployment ID
CLI->>Deployer: Wait for readiness
alt Ready
CLI->>Prof: Run prefill/decode profiling
Prof-->>CLI: Results
CLI->>Logs: Fetch deployment logs
Logs-->>CLI: Logs
else Timeout
Note over CLI,Deployer: On TimeoutError: skip profiling/logs
end
CLI-->>Deployer: Cleanup (finally)
Deployer-->>CLI: Cleanup done
CLI->>User: Emit results (or skipped state)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
benchmarks/profiler/utils/profile_prefill.py (2)
34-38: Guard against zero step in range() to avoid ValueError.
When interpolation_granularity >= (max_context_length-100), step becomes 0.Apply:
- for isl in range( - 100, - max_context_length, - (max_context_length - 100) // interpolation_granularity, - ): + step = max(1, (max_context_length - 100) // max(1, interpolation_granularity)) + for isl in range(100, max_context_length, step):
48-53: Avoid divide-by-zero and invalid throughput when ttft or num_gpus is non-positive.- if gap_result is not None: - ttft = gap_result["time_to_first_token"]["avg"] - prefill_isl.append(isl) - prefill_ttft.append(ttft) - prefill_thpt_per_gpu.append(isl / ttft / num_gpus * 1000) + if gap_result is not None: + ttft = gap_result["time_to_first_token"]["avg"] + if ttft <= 0 or num_gpus <= 0: + logger.warning(f"Skipping ISL {isl} due to non-positive ttft ({ttft}) or num_gpus ({num_gpus}).") + continue + prefill_isl.append(isl) + prefill_ttft.append(ttft) + prefill_thpt_per_gpu.append(isl / ttft / num_gpus * 1000)benchmarks/profiler/utils/config.py (4)
158-176: VLLM prefill→decode conversion: avoid AttributeError and fragile remove().
- Use the safe container accessor.
- Only remove flags if present.
- args = cfg.spec.services[ - WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name - ].extraPodSpec.mainContainer.args + svc = cfg.spec.services[WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name] + container = get_or_create_main_container(svc) + args = container.args args = break_arguments(args) - # remove --is-prefill-worker flag - args.remove("--is-prefill-worker") + # remove --is-prefill-worker flag + if "--is-prefill-worker" in args: + args.remove("--is-prefill-worker") # disable prefix caching if "--enable-prefix-caching" in args: args.remove("--enable-prefix-caching") if "--no-enable-prefix-caching" not in args: args = append_argument(args, "--no-enable-prefix-caching") - cfg.spec.services[ - WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name - ].extraPodSpec.mainContainer.args = join_arguments(args) + container.args = join_arguments(args)
183-198: VLLM decode path: use safe accessor; keep flags consistent.- args = cfg.spec.services[ - WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name - ].extraPodSpec.mainContainer.args + svc = cfg.spec.services[WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name] + container = get_or_create_main_container(svc) + args = container.args args = break_arguments(args) # enable prefix caching if "--enable-prefix-caching" not in args: args = append_argument(args, "--enable-prefix-caching") if "--no-enable-prefix-caching" in args: args.remove("--no-enable-prefix-caching") - cfg.spec.services[ - WORKER_COMPONENT_NAMES["vllm"].decode_worker_k8s_name - ].extraPodSpec.mainContainer.args = join_arguments(args) + container.args = join_arguments(args)
311-314: Fix exception log referencing undefined variable ‘line’.
If open() fails or no matching line is seen, ‘line’ is undefined.- logger.warning( - f"Failed to parse KV cache size from line: {line}. Error: {e}" - ) + logger.warning( + f"Failed to parse KV cache size from file {dynamo_log_fn}. Error: {e}" + )
208-246: Add missingextraPodSpec.mainContainer.argsin disagg.yaml services
components/backends/vllm/deploy/disagg.yaml and components/backends/sglang/deploy/disagg.yaml currently lackextraPodSpec.mainContainer.argsentries for the Frontend and worker services (script output: MISSING), which will trigger an AttributeError at runtime. Add anargs: [](or appropriate default arguments) underextraPodSpec.mainContainerfor each of those services.
🧹 Nitpick comments (9)
benchmarks/profiler/utils/config.py (4)
484-490: KV cache parser for SGLang log looks reasonable.
Minor: consider compiling the regex once if this runs hot; otherwise fine.
61-64: Remove unused Services model or convert to a dict-root model.
Spec.services is already dict[str, Service]; this class is unused and misleading.-class Services(BaseModel): - Frontend: Service - model_config = {"extra": "allow"} +# Removed unused Services model; Spec.services already models a dict of Service.
383-390: Unify return types between modifiers.
VLLM.convert_config returns a dict via model_dump(); SGLang.convert_config returns the original ‘config’. Choose one for consistency.
249-257: Model name fallback is fine; optionally probe alt flags.
If you expect alt flags (e.g., “--model-name”), consider checking them before defaulting.tests/profiler/test_profile_sla_dryrun.py (2)
28-31: Avoid sys.path mutation in tests.Use package-relative imports or ensure the package is importable via PYTHONPATH/pytest configuration. sys.path hacking is brittle in CI.
Apply this diff to prefer repo-root-relative import without manual path injection (or configure pytest to add the root via addopts):
-import sys -from pathlib import Path +from pathlib import Path @@ -# Add the project root to sys.path to enable imports -project_root = Path(__file__).parent.parent.parent -sys.path.insert(0, str(project_root)) +# Ensure tests run from repo root via pytest.ini or CI; avoid sys.path mutation.
42-61: Make tests robust to missing config files and isolate outputs per test.
- Skip cleanly if disagg.yaml isn’t present in the checkout.
- Write to a unique tmp directory to avoid cross-test collisions.
Apply this diff:
- @pytest.fixture - def vllm_args(self): + @pytest.fixture + def vllm_args(self, tmp_path): @@ - class Args: + config_path = Path("components/backends/vllm/deploy/disagg.yaml") + if not config_path.exists(): + pytest.skip(f"Missing config: {config_path}") + + class Args: backend = "vllm" - config = "components/backends/vllm/deploy/disagg.yaml" - output_dir = "/tmp/test_profiling_results" + config = str(config_path) + output_dir = str(tmp_path / "profiling_results_vllm") namespace = "test-namespace" @@ - service_name = "" + service_name = None dry_run = True @@ - def sglang_args(self): + def sglang_args(self, tmp_path): @@ - class Args: + config_path = Path("components/backends/sglang/deploy/disagg.yaml") + if not config_path.exists(): + pytest.skip(f"Missing config: {config_path}") + + class Args: backend = "sglang" - config = "components/backends/sglang/deploy/disagg.yaml" - output_dir = "/tmp/test_profiling_results" + config = str(config_path) + output_dir = str(tmp_path / "profiling_results_sglang") namespace = "test-namespace" @@ - service_name = "" + service_name = None dry_run = TrueAlso applies to: 67-86
benchmarks/profiler/profile_sla.py (3)
94-96: Avoid reusing mutated configs across TP iterations.set_config_tp_size may mutate; build a fresh converted config per TP to prevent carry-over between iterations.
Apply this diff:
- prefill_config = config_modifier.convert_config(config, "prefill") @@ - prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size) + prefill_config = config_modifier.set_config_tp_size( + config_modifier.convert_config(config, "prefill"), tp_size + ) @@ - decode_config = config_modifier.convert_config(config, "decode") @@ - decode_config = config_modifier.set_config_tp_size(decode_config, tp_size) + decode_config = config_modifier.set_config_tp_size( + config_modifier.convert_config(config, "decode"), tp_size + )Also applies to: 118-121, 191-194, 233-241
45-53: Guard logger handler to prevent duplicates on import.When imported in tests, multiple handlers can be attached.
Apply this diff:
-logger = logging.getLogger(__name__) -logger.setLevel(logging.INFO) -console_handler = logging.StreamHandler() -console_handler.setLevel(logging.INFO) -formatter = logging.Formatter( - "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S" -) -console_handler.setFormatter(formatter) -logger.addHandler(console_handler) +logger = logging.getLogger(__name__) +logger.setLevel(logging.INFO) +if not logger.handlers: + console_handler = logging.StreamHandler() + console_handler.setLevel(logging.INFO) + formatter = logging.Formatter( + "%(asctime)s - %(name)s - %(levelname)s - %(message)s", "%Y-%m-%d %H:%M:%S" + ) + console_handler.setFormatter(formatter) + logger.addHandler(console_handler)
61-71: Validate backend and GPU bounds early.Raise a clear error if backend is unknown or min/max are invalid.
Example:
if args.backend not in CONFIG_MODIFIERS: raise ValueError(f"Unsupported backend: {args.backend}") if args.min_num_gpus_per_engine < 1 or args.max_num_gpus_per_engine < args.min_num_gpus_per_engine: raise ValueError("Invalid GPU bounds: min must be >=1 and <= max")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
benchmarks/profiler/profile_sla.py(6 hunks)benchmarks/profiler/utils/config.py(7 hunks)benchmarks/profiler/utils/profile_decode.py(1 hunks)benchmarks/profiler/utils/profile_prefill.py(1 hunks)tests/profiler/__init__.py(1 hunks)tests/profiler/test_profile_sla_dryrun.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
benchmarks/profiler/utils/profile_prefill.py (2)
benchmarks/profiler/utils/genai_perf.py (1)
benchmark_prefill(154-186)benchmarks/profiler/utils/plot.py (1)
plot_prefill_interpolation(103-160)
benchmarks/profiler/utils/profile_decode.py (2)
benchmarks/profiler/utils/genai_perf.py (1)
benchmark_decode(189-247)benchmarks/profiler/utils/plot.py (1)
plot_decode_3d_surface(163-254)
benchmarks/profiler/profile_sla.py (7)
benchmarks/profiler/utils/genai_perf.py (2)
benchmark_decode(189-247)benchmark_prefill(154-186)benchmarks/profiler/utils/plot.py (2)
plot_decode_performance(74-100)plot_prefill_performance(34-71)benchmarks/profiler/utils/profile_cache.py (4)
check_decode_results_exist(56-88)check_prefill_results_exist(26-53)load_existing_decode_results(111-138)load_existing_prefill_results(91-108)benchmarks/profiler/utils/profile_decode.py (1)
profile_decode(22-102)benchmarks/profiler/utils/profile_prefill.py (1)
profile_prefill(22-80)deploy/utils/dynamo_deployment.py (6)
DynamoDeploymentClient(97-481)create_deployment(219-271)wait_for_deployment_ready(273-400)get_deployment_logs(430-461)get_service_url(211-217)delete_deployment(463-481)benchmarks/profiler/utils/config.py (2)
get_kv_cache_size_from_dynamo_log(295-315)get_kv_cache_size_from_dynamo_log(479-491)
tests/profiler/test_profile_sla_dryrun.py (1)
benchmarks/profiler/profile_sla.py (1)
run_profile(56-534)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - dynamo
- GitHub Check: Build and Test - vllm
🔇 Additional comments (7)
tests/profiler/__init__.py (1)
1-16: LGTM — license header and docstring only.
No runtime impact.benchmarks/profiler/utils/profile_decode.py (1)
8-9: LGTM — import path refactor only.
No functional changes; consistent with utils relocation.benchmarks/profiler/utils/config.py (5)
211-228: Nice hardening of resources mutation.
Creating resources/requests when absent prevents KeyError and keeps limits in sync.
266-293: Robust port extraction — good defaults and guards.
Covers missing Frontend/args cleanly and avoids crashes.
395-412: Mirrored resource hardening for SGLang — looks good.
Prevents missing dicts while updating GPU requests/limits.
449-476: SGLang port extraction hardening — good parity with VLLM.
80-90: break_arguments None-handling — nice improvement.
Prevents crashes when args are absent and skips None entries.
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Summary by CodeRabbit