test: add tests for replica calculation and planner scaling #2525

hhzhang16 · 2025-08-19T15:22:28Z

Overview:

This MR implements comprehensive testing infrastructure for SLA planner scaling behavior, enabling automated validation of prefill and decode worker scaling under varying load conditions. The test suite includes both unit tests for replica calculation logic and end-to-end Kubernetes deployment tests that validate scaling from 1P1D to 2P1D configurations.

Details:

End-to-end scaling tests with automated Kubernetes deployment; gradual approach (8→15→25 req/s)
Unit tests for replica calculation mathematical formulas and logic
load_generator.py includes reusable components for traffic simulation and performance analysis

What it looks like:

> ./run_scaling_test.sh --namespace hannahz --save-results

[INFO] SLA Planner Scaling Test
[INFO] Namespace: hannahz
[INFO] Scenario: Graduated 8->15->25 req/s (1P1D -> 2P1D prefill scaling, ISL=4000/OSL=150)
[INFO] Checking prerequisites...
[SUCCESS] Prerequisites check passed
[INFO] Checking for existing deployment...
[INFO] DynamoGraphDeployment vllm-disagg-planner already exists - skipping redeployment
[WARNING] Existing deployment is not ready (status: pending), will redeploy
[INFO] Deploying SLA planner...
dynamographdeployment.nvidia.com/vllm-disagg-planner configured
[SUCCESS] Deployment applied successfully
[INFO] Waiting for DynamoGraphDeployment to be processed...
dynamographdeployment.nvidia.com/vllm-disagg-planner condition met
[SUCCESS] DynamoGraphDeployment is ready
[INFO] Waiting for pods to be running (this may take several minutes for image pulls)...
[INFO] Waiting for frontend pod...
pod/vllm-disagg-planner-frontend-76f6696765-zmpx9 condition met
pod/vllm-disagg-planner-prometheus-74fcd6f557-whnf5 condition met
[SUCCESS] Frontend pod is ready
[INFO] Waiting for all pods to be running...
[INFO] Setting up port forwarding...
[INFO] Port forwarding to service: vllm-disagg-planner-frontend
[INFO] Waiting for port forwarding to be established...
[SUCCESS] Port forwarding established and service is healthy
[INFO] Running scaling test (graduated 8->15->25 req/s)...
[INFO] Results will be saved to tests/planner/e2e_scaling_results
2025-08-28 07:10:06,795 - INFO - Checking service availability at http://localhost:8000...
2025-08-28 07:10:06,795 - INFO - Running scaling test...
2025-08-28 07:10:06,795 - INFO - Starting scaling integration test
2025-08-28 07:10:07,739 - INFO - Test starting with: P=1, D=1, Total=2
2025-08-28 07:10:07,739 - INFO - Running hardcoded scaling scenario (12 req/s -> 24 req/s)
2025-08-28 07:10:07,739 - INFO - Starting graduated prefill scaling test scenario (targeting 1P1D -> 2P1D)
2025-08-28 07:10:07,739 - INFO - Using conservative graduated approach with metric generation
2025-08-28 07:10:07,739 - INFO - Saving results to: /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207
2025-08-28 07:10:07,739 - INFO - Starting phase1_baseline: 8.0 req/s for 90s
2025-08-28 07:10:07,739 - INFO - Generating load: 8.0 req/s for 90s
2025-08-28 07:10:07,739 - INFO - Using request_rate=8.0 req/s
2025-08-28 07:10:07,739 - INFO - Adjusted parameters: target_duration=60s, request_count=480
2025-08-28 07:10:07,740 - INFO - Running command: genai-perf profile --model nvidia/Llama-3.1-8B-Instruct-FP8 --tokenizer nvidia/Llama-3.1-8B-Instruct-FP8 --endpoint-type chat --url localhost:8000 --streaming --synthetic-input-tokens-mean 4000 --output-tokens-mean 150 --request-rate 8.0 --request-count 480 --goodput time_to_first_token:200 inter_token_latency:10 --stability-percentage 50 --num-dataset-entries 80 --artifact-dir /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/phase1_baseline -- -v -max-threads 64
2025-08-28 07:11:15,074 - INFO - Load generation completed successfully
2025-08-28 07:11:15,074 - INFO - Actual duration: 67.33s
2025-08-28 07:11:15,074 - WARNING - No JSON results found in artifact directory
2025-08-28 07:11:15,074 - INFO - Transition delay: 30s
2025-08-28 07:11:15,074 - INFO - Monitoring pod scaling for 480s (interval: 15s)
2025-08-28 07:11:16,024 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:11:31,992 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:11:45,088 - INFO - Starting phase2_moderate: 15.0 req/s for 120s
2025-08-28 07:11:45,088 - INFO - Generating load: 15.0 req/s for 120s
2025-08-28 07:11:45,088 - INFO - Using request_rate=15.0 req/s
2025-08-28 07:11:45,088 - INFO - Adjusted parameters: target_duration=60s, request_count=900
2025-08-28 07:11:45,088 - INFO - Running command: genai-perf profile --model nvidia/Llama-3.1-8B-Instruct-FP8 --tokenizer nvidia/Llama-3.1-8B-Instruct-FP8 --endpoint-type chat --url localhost:8000 --streaming --synthetic-input-tokens-mean 4000 --output-tokens-mean 150 --request-rate 15.0 --request-count 900 --goodput time_to_first_token:200 inter_token_latency:10 --stability-percentage 50 --num-dataset-entries 150 --artifact-dir /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/phase2_moderate -- -v -max-threads 64
2025-08-28 07:12:55,082 - INFO - Load generation completed successfully
2025-08-28 07:12:55,082 - INFO - Actual duration: 69.99s
2025-08-28 07:12:55,082 - WARNING - No JSON results found in artifact directory
2025-08-28 07:12:55,083 - INFO - Transition delay: 30s
2025-08-28 07:12:56,037 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:13:11,993 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:13:25,096 - INFO - Starting phase3_prefill_scaling_trigger: 25.0 req/s for 180s
2025-08-28 07:13:25,096 - INFO - Generating load: 25.0 req/s for 180s
2025-08-28 07:13:25,096 - INFO - Using request_rate=25.0 req/s
2025-08-28 07:13:25,096 - INFO - Adjusted parameters: target_duration=30s, request_count=750
2025-08-28 07:13:25,097 - INFO - Running command: genai-perf profile --model nvidia/Llama-3.1-8B-Instruct-FP8 --tokenizer nvidia/Llama-3.1-8B-Instruct-FP8 --endpoint-type chat --url localhost:8000 --streaming --synthetic-input-tokens-mean 4000 --output-tokens-mean 150 --request-rate 25.0 --request-count 750 --goodput time_to_first_token:200 inter_token_latency:10 --stability-percentage 50 --num-dataset-entries 250 --artifact-dir /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/phase3_prefill_scaling_trigger -- -v -max-threads 64
2025-08-28 07:14:20,468 - INFO - Load generation completed successfully
2025-08-28 07:14:20,468 - INFO - Actual duration: 55.37s
2025-08-28 07:14:20,468 - WARNING - No JSON results found in artifact directory
2025-08-28 07:14:20,468 - INFO - Graduated scaling test completed successfully
2025-08-28 07:14:20,468 - INFO - Test results saved to: /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/scaling_test_results.json
2025-08-28 07:14:21,435 - INFO - Final pod counts: P=1, D=1, Total=2
2025-08-28 07:14:21,435 - INFO - Waiting for potential delayed scaling...
2025-08-28 07:14:22,434 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:14:38,353 - INFO - Pod counts: P=2, D=1, Total=3
2025-08-28 07:14:54,314 - INFO - Pod counts: P=2, D=1, Total=3
2025-08-28 07:15:10,343 - INFO - Pod counts: P=2, D=1, Total=3
2025-08-28 07:15:22,347 - INFO - Final final pod counts: P=2, D=1, Total=3
2025-08-28 07:15:22,348 - INFO - ============================================================
2025-08-28 07:15:22,348 - INFO - TEST SUMMARY
2025-08-28 07:15:22,348 - INFO - ============================================================
2025-08-28 07:15:22,348 - INFO - ✅ Test PASSED: Successfully scaled from 1P1D to 2P1D
2025-08-28 07:15:22,348 - INFO - 
Detailed results saved to: /tmp/scaling_test_results_1756390522.json
2025-08-28 07:15:22,348 - INFO - ============================================================
[SUCCESS] Scaling test PASSED
[SUCCESS] Test completed successfully!
[INFO] Cleaning up port forwarding...

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Documentation
- Reformatted a note for clarity.
- Added planner dry-run and scaling test guides with usage, plots, scenarios, and prerequisites.
Tests
- Added unit tests for replica calculation across edge cases and budgets.
- Introduced an end-to-end scaling test with Kubernetes monitoring and health checks.
- Provided a Kubernetes deployment manifest for the disaggregated planner.
- Added a load generator utility and CLI for multi-phase scaling runs with results export.
- Included a script to deploy, port-forward, run tests, and clean up.
- Disabled verbose logging in planner tests.
Chores
- Ignored test artifacts and temporary files.

copy-pr-bot · 2025-08-28T15:56:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

tedzhouhk · 2025-08-28T17:08:36Z

can we put the files related to this test in a subfolder?

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

coderabbitai · 2025-08-28T18:26:52Z

Walkthrough

Adds planner test infrastructure and docs: a Kubernetes DynamoGraphDeployment manifest for a disaggregated vLLM planner, an end-to-end scaling test (script and Python), a unit test suite for replica calculation, a load generator utility, local test fixtures, a .gitignore, and a minor docs formatting fix.

Changes

Cohort / File(s)	Summary
Docs formatting `docs/architecture/sla_planner.md`	Fixed blockquote formatting by replacing a stray "->" with ">"; content unchanged.
Planner tests gitignore `tests/planner/.gitignore`	Ignored E2E artifacts, temp files, and Python caches.
Planner test fixture override `tests/planner/conftest.py`	Added autouse `logger` fixture that no-ops to disable inherited logging in planner tests.
Disaggregated planner K8s manifest `tests/planner/disagg_planner.yaml`	Added `DynamoGraphDeployment` with Frontend, Planner, Prometheus, VllmPrefillWorker, VllmDecodeWorker components, envs, probes, and commands.
E2E scaling test harness `tests/planner/run_scaling_test.sh`, `tests/planner/test_scaling_e2e.py`	New shell script to deploy/check/port-forward and run the E2E test; Python test monitors K8s pod counts, drives two-phase load, validates 1P1D→2P1D scaling, and saves results.
Unit tests for replica calculation `tests/planner/test_replica_calculation.py`	Added extensive pytest suite validating prefill/decode replica math under constraints, budgets, and multi-GPU configs using mocked planner internals.
Load generation utility `tests/planner/utils/load_generator.py`, `tests/planner/utils/__init__.py`	Introduced `LoadGenerator` using genai-perf for multi-phase runs, result parsing, and optional persistence; added SPDX-only `__init__`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as User
  participant Sh as run_scaling_test.sh
  participant K8s as Kubernetes API
  participant Svc as Frontend Service
  participant E2E as test_scaling_e2e.py
  participant LG as LoadGenerator

  Dev->>Sh: Invoke script (namespace, YAML, save-results)
  Sh->>K8s: Check/apply DynamoGraphDeployment
  Sh->>K8s: Wait for Ready (deployment, pods)
  Sh->>Svc: Port-forward :8000 -> :8000
  Sh->>Svc: /health poll until OK
  Sh->>E2E: Run E2E test with namespace/base URL
  par Monitor
    E2E->>K8s: Start pod-count monitor (prefill/decode)
  and Load
    E2E->>LG: Run phased load
    LG->>Svc: Send requests (genai-perf)
  end
  E2E->>E2E: Analyze counts (1P1D→2P1D), validate
  E2E-->>Sh: Exit code and results
  Sh->>K8s: Optional cleanup (delete deployment)
  Sh-->>Dev: Summary and return code
  note over E2E,K8s: Scaling observed via pod counts timeline

sequenceDiagram
  autonumber
  participant Client as genai-perf
  participant FE as Frontend
  participant Planner as SLA Planner
  participant Pref as VllmPrefillWorker
  participant Dec as VllmDecodeWorker
  participant Prom as Prometheus

  Client->>FE: Generate requests (RPS phases)
  FE->>Prom: Expose /metrics (ISL, OSL, TTFT, ITL, reqs)
  Planner->>Prom: Scrape metrics
  Planner->>Planner: Compute desired replicas
  Planner->>K8s: Set replicas Pref/Dec
  K8s-->>Pref: Scale prefill pods
  K8s-->>Dec: Scale decode pods
  Client-->>FE: Observe capacity change over phases
  note over Planner: Adjustment interval (e.g., 60s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

chore: Change vllm K8s from dynamo-run to python -m dynamo.frontend #2055 — Introduces/discusses disaggregated planner YAMLs and python3 -m dynamo.frontend command usage, matching this PR’s manifest patterns.
feat: add crds for vllm and llm examples #1766 — Adds similar DynamoGraphDeployment configs for a vLLM disaggregated planner; closely aligned with the new disagg_planner.yaml.
test: add dryrun mode for sla planner #2557 — Implements SLA planner dry-run workflow and docs; relates to the newly documented dry-run and scaling tests in this PR.

Poem

A rabbit taps a cluster’s door,
Pods hop from one to two, then more—
Prefill, decode, in tidy rows,
The planner counts, the traffic flows.
With scripts and charts we watch them scale,
Carrots cached in /tmp—no fail! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 20

🧹 Nitpick comments (14)

tests/planner/.gitignore (1)

1-11: Broaden ignores to cover pytest/artifacts and local results.

Prevents accidental commits from local runs.

 # E2E test results - don't commit test artifacts to git
 e2e_scaling_results/
 
 # Temporary files
 *.tmp
 *.log
 
 # Python cache
 __pycache__/
 *.pyc
 *.pyo
+ 
+# Pytest artifacts
+.pytest_cache/
+.coverage
+.coverage.*
+
+# macOS cruft
+.DS_Store
+
+# Local load-generator result dirs
+scaling_test_*/

tests/planner/README.md (2)

169-173: Use a heading instead of emphasis (MD036).
-**E2E Test Deployment Management:**
+### E2E Test Deployment Management
183-184: Correct total duration estimate.

90 + 120 + 180 + 2×30 = 450s ≈ 7.5 minutes.
-- **Total test duration**: ~7 minutes + scaling observation
+- **Total test duration**: ~7.5 minutes + scaling observation

tests/planner/utils/load_generator.py (4)

21-25: Avoid configuring logging at import time.

Move basicConfig to CLI entrypoint to not override repo/test logging.
-logging.basicConfig(
-    level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
-)
 logger = logging.getLogger(__name__)
Add in main() before using logger:
# configure logging only for CLI
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
44-58: Docstring lists nonexistent parameters.

Remove stray args and clarify return type.
     def _calculate_genai_perf_params(
         self,
         req_per_sec: float,
     ) -> Dict[str, Any]:
         """
-        Calculate genai-perf parameters to approximate desired request rate.
-
-        Args:
-            req_per_sec: Desired requests per second
-            duration_sec: Test duration in seconds
-            estimated_request_duration: Estimated average request duration in seconds
-
-        Returns:
-            Dictionary with concurrency and request_rate parameters
-        """
+        Calculate genai-perf parameters to approximate the desired request rate.
+
+        Args:
+            req_per_sec: Desired requests per second.
+
+        Returns:
+            Dict with keys: {"concurrency": int, "request_rate": float}.
+        """
246-268: Docstring and phase description mismatch with e2e expectations.

Docstring says 5/10/18 rps; code uses 8/15/25. Align docstring or phases; keeping 8/15/25 is fine, just update text.
-        Uses a conservative graduated approach:
-        - Phase 1: 5 req/s (baseline, should work)
-        - Phase 2: 10 req/s (moderate load)
-        - Phase 3: 18 req/s (should trigger prefill scaling to 2P1D)
+        Uses a conservative graduated approach:
+        - Phase 1: 8 req/s (baseline)
+        - Phase 2: 15 req/s (moderate load)
+        - Phase 3: 25 req/s (should trigger prefill scaling to 2P1D)
266-312: Interface mismatch with e2e runner.

tests/planner/test_scaling_e2e.py expects two phases (12/24 rps) and keys "phase1"/"phase2", while this generator runs three phases (8/15/25) and returns "phase_results". Either adapt the e2e test to consume "phase_results" or expose a configurable phase list here with defaults matching the e2e expectations.

Would you like me to add a --phases JSON CLI arg and plumb it through so both 12/24 and 8/15/25 scenarios are supported?

tests/planner/conftest.py (1)

12-15: Consider documenting the test isolation pattern.

While disabling the parent logger fixture is a valid approach for planner tests, consider adding more details about why logging is disabled and what parent fixture is being overridden to help future maintainers understand this test isolation pattern.
 @pytest.fixture(autouse=True)
 def logger(request):
-    """Dummy logger fixture that does nothing - overrides the parent one."""
+    """
+    Dummy logger fixture that does nothing - overrides the parent autouse logger fixture.
+    
+    This prevents automatic test logging from the parent conftest.py, allowing planner
+    tests to run with cleaner output and without interference from the parent fixture's
+    logging configuration.
+    """
     yield

tests/planner/test_replica_calculation.py (3)

139-141: Avoid importing asyncio inside test methods.

Importing asyncio within test methods is unconventional and reduces readability.

Move the import to the top of the file:

+import asyncio
 import argparse
 import math
 import os

 # ... in the test method:
-        import asyncio
-
         asyncio.run(planner.make_adjustments())

150-151: Remove debug print statements in tests.

Debug print statements should be replaced with proper assertions or logging.

-            print(f"Expected prefill replicas: {expected_prefill_replicas}")
-            print(f"Calculated prefill replicas: {calculated_prefill_replicas}")
+            # Assert with a descriptive message
+            assert (
+                max(expected_prefill_replicas, planner.args.min_endpoint)
+                == calculated_prefill_replicas
+            ), f"Expected {max(expected_prefill_replicas, planner.args.min_endpoint)}, got {calculated_prefill_replicas}"

223-229: Test parametrization could be more descriptive.

The parametrized test cases would benefit from using pytest.param with IDs for better test output.

     @pytest.mark.parametrize(
         "num_req,decode_thpt,expected_p,expected_d",
         [
-            (10, 10000, 1, 1),  # low_load_10_req_per_second
-            (500, 1000, 1, 2),  # high_load_500_req_per_second (lower decode throughput)
+            pytest.param(10, 10000, 1, 1, id="low_load_10_req_per_second"),
+            pytest.param(500, 1000, 1, 2, id="high_load_500_req_per_second"),
         ],
     )

tests/planner/run_scaling_test.sh (1)

182-182: Remove unused loop variable.

The variable i in the loop is never used.
-    for i in {1..30}; do
+    for _ in {1..30}; do

tests/planner/disagg_planner.yaml (2)

13-13: Large inline configuration may be hard to maintain.

The DYNAMO_SERVICE_CONFIG environment variable contains a large inline JSON configuration that's difficult to read and maintain.

Consider using a ConfigMap for better maintainability:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: dynamo-service-config
data:
  config.json: |
    {
      "Prometheus": {
        "global": {
          "scrape_interval": "5s"
        },
        "scrape_configs": [
          {
            "job_name": "prometheus",
            "static_configs": [{"targets": ["localhost:9090"]}]
          },
          {
            "job_name": "frontend",
            "static_configs": [{"targets": ["vllm-disagg-planner-frontend:8000"]}]
          }
        ]
      }
    }

Then reference it in the deployment.

26-27: Frontend container command uses deprecated format.

Using args with a shell command string is not the recommended approach.

       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-301.6
-          args:
-            - "python3 -m dynamo.frontend --http-port 8000"
+          command:
+            - python3
+          args:
+            - -m
+            - dynamo.frontend
+            - --http-port
+            - "8000"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e3619ce and f284d77.

📒 Files selected for processing (10)

docs/architecture/sla_planner.md (1 hunks)
tests/planner/.gitignore (1 hunks)
tests/planner/README.md (2 hunks)
tests/planner/conftest.py (1 hunks)
tests/planner/disagg_planner.yaml (1 hunks)
tests/planner/run_scaling_test.sh (1 hunks)
tests/planner/test_replica_calculation.py (1 hunks)
tests/planner/test_scaling_e2e.py (1 hunks)
tests/planner/utils/__init__.py (1 hunks)
tests/planner/utils/load_generator.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

tests/planner/run_scaling_test.sh (2)

tests/planner/test_scaling_e2e.py (1)

main (445-507)

tests/planner/utils/load_generator.py (1)

main (335-398)

tests/planner/utils/load_generator.py (2)

tests/planner/test_scaling_e2e.py (2)

run_scaling_test (267-342)

main (445-507)

tests/planner/run_scaling_test.sh (1)

main (235-302)

tests/planner/test_scaling_e2e.py (1)

tests/planner/utils/load_generator.py (3)

LoadGenerator (27-332)

run_scaling_test (246-332)

main (335-398)

tests/planner/test_replica_calculation.py (3)

components/planner/src/dynamo/planner/utils/planner_core.py (4)

Metrics (31-52)

Planner (55-525)

get_workers_info (129-178)

make_adjustments (331-388)

components/planner/src/dynamo/planner/utils/perf_interpolation.py (3)

find_best_throughput_per_gpu (147-168)

interpolate_ttft (48-50)

interpolate_itl (137-139)

components/planner/src/dynamo/planner/kubernetes_connector.py (1)

set_component_replicas (116-139)

🪛 Shellcheck (0.10.0)

tests/planner/run_scaling_test.sh

[warning] 101-101: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 166-166: Quote this to prevent word splitting.

(SC2046)

[warning] 182-182: i appears unused. Verify use (or export if used externally).

(SC2034)

🪛 LanguageTool

docs/architecture/sla_planner.md

[grammar] ~118-~118: There might be a mistake here.
Context: ...rts metrics at /metrics HTTP endpoint with number of requests, ISL, OSL, TTFT, ITL...

(QB_NEW_EN)

[grammar] ~118-~118: There might be a mistake here.
Context: ...nd provides these metrics automatically.

(QB_NEW_EN)

tests/planner/README.md

[grammar] ~145-~145: There might be a mistake here.
Context: ...## Quick Start #### Run Unit Tests Only Test the replica calculation logic witho...

(QB_NEW_EN)

[grammar] ~152-~152: There might be a mistake here.
Context: ...py -v ``` #### Run Full End-to-End Test Test complete scaling behavior including...

(QB_NEW_EN)

[grammar] ~169-~169: There might be a mistake here.
Context: ...s ``` E2E Test Deployment Management: - If no deployment exists: creates, tests,...

(QB_NEW_EN)

[grammar] ~170-~170: There might be a mistake here.
Context: ...ment exists: creates, tests, and cleans up deployment - If deployment exists: uses...

(QB_NEW_EN)

[grammar] ~178-~178: There might be a mistake here.
Context: ...eq/s for 90s (baseline - maintains 1P1D) - Phase 2: 15 req/s for 120s (moderate l...

(QB_NEW_EN)

[grammar] ~179-~179: There might be a mistake here.
Context: ...or 120s (moderate load - maintains 1P1D) - Phase 3: 25 req/s for 180s (scaling tr...

(QB_NEW_EN)

[grammar] ~180-~180: There might be a mistake here.
Context: ... 180s (scaling trigger - scales to 2P1D) - ISL/OSL: 4000/150 tokens (optimized fo...

(QB_NEW_EN)

[grammar] ~181-~181: There might be a mistake here.
Context: ...okens (optimized for prefill bottleneck) - Transition delay: 30s between phases -...

(QB_NEW_EN)

[grammar] ~182-~182: There might be a mistake here.
Context: ...Transition delay: 30s between phases - Total test duration: ~7 minutes + scal...

(QB_NEW_EN)

[grammar] ~183-~183: There might be a mistake here.
Context: ...tion**: ~7 minutes + scaling observation - Smart cleanup: Only removes deployment...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

tests/planner/README.md

174-174: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (2)

tests/planner/utils/__init__.py (1)

1-2: LGTM: package marker and SPDX headers.

tests/planner/utils/load_generator.py (1)

117-119: Maintain host:port only for --url. genai-perf’s --url flag expects a host:port (no http://); the existing .replace("http://", "") aligns with official examples (e.g. --url localhost:8001). (docs.nvidia.com)

Likely an incorrect or invalid review comment.

docs/architecture/sla_planner.md

tests/planner/disagg_planner.yaml

tests/planner/README.md

tests/planner/run_scaling_test.sh

tests/planner/test_scaling_e2e.py

tests/planner/utils/load_generator.py

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

dep-301-test-dynamo-planner-scaling-up

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

dep-301-test-dynamo-planner-scaling-up

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

devactivity-app · 2025-08-29T17:52:30Z

Pull Request Summary by devActivity

Metrics

Achievements

@hhzhang16
Earned XP: 15⭐
Sign up here to check your progress

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: ayushag <ayushag@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Michael Shin <michaelshin@users.noreply.github.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

pull-request-size bot added the size/XXL label Aug 19, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 19, 2025 15:22 Inactive

github-actions bot added the test label Aug 19, 2025

pull-request-size bot added size/XL and removed size/XXL labels Aug 19, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 19, 2025 15:23 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 19, 2025 15:24 Inactive

copy-pr-bot bot had a problem deploying to GITLAB August 19, 2025 15:33 Failure

pull-request-size bot added size/XXL and removed size/XL labels Aug 19, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 19, 2025 20:06 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 19, 2025 20:10 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 18:28 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 18:29 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 00:15 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 00:16 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 20:32 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 20:37 Inactive

hhzhang16 self-assigned this Aug 21, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 21:38 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 21:43 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 17:24 Inactive

copy-pr-bot bot had a problem deploying to GITLAB August 22, 2025 17:24 Failure

harryskim mentioned this pull request Aug 22, 2025

[Roadmap]: 0.4.1 - 0.5.0 roadmap and key dates #2649

Open

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 20:37 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 20:42 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 20:48 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 20:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 22, 2025 20:58 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 25, 2025 16:35 Inactive

hhzhang16 force-pushed the hannahz/dep-301-test-dynamo-planner-scaling-up branch from 7d3ebf9 to ccf304a Compare August 28, 2025 15:56

docs: remove mention to nonexistent md

f284d77

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

hhzhang16 marked this pull request as ready for review August 28, 2025 18:12

hhzhang16 requested review from jasonqinzhou and tedzhouhk as code owners August 28, 2025 18:12

refactor: move scaling-related files into scaling subfolder

5710f4e

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

coderabbitai bot reviewed Aug 28, 2025

View reviewed changes

tedzhouhk approved these changes Aug 28, 2025

View reviewed changes

tests/planner/utils/load_generator.py Outdated Show resolved Hide resolved

hhzhang16 added 9 commits August 28, 2025 20:41

feat: addressing CodeRabbit MR comments

7d94d7d

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: pre-commit fixes

3e18c2c

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: remove goodput

eb79bb1

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: capture all 3 phases in results

67ae8d5

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: CodeRabbit MR comments

301aa7a

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hannahz/

6c9301a

dep-301-test-dynamo-planner-scaling-up

fix: mypy issues

0d2dc08

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hannahz/

9e8bcee

dep-301-test-dynamo-planner-scaling-up

fix: mypy issues

48c7e17

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

hhzhang16 enabled auto-merge (squash) August 29, 2025 16:42

fix: mypy issues

e5bb566

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

hhzhang16 merged commit 353ba5d into main Aug 29, 2025
10 checks passed

hhzhang16 deleted the hannahz/dep-301-test-dynamo-planner-scaling-up branch August 29, 2025 17:52

ayushag-nv pushed a commit that referenced this pull request Aug 29, 2025

test: add tests for replica calculation and planner scaling (#2525)

53f8a02

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: ayushag <ayushag@nvidia.com>

jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025

test: add tests for replica calculation and planner scaling (#2525)

948892b

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>

michaelshin pushed a commit that referenced this pull request Sep 2, 2025

test: add tests for replica calculation and planner scaling (#2525)

8f09386

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Michael Shin <michaelshin@users.noreply.github.com>

KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025

test: add tests for replica calculation and planner scaling (#2525)

9760118

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

test: add tests for replica calculation and planner scaling (#2525)

cdb4106

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

test: add tests for replica calculation and planner scaling #2525

test: add tests for replica calculation and planner scaling #2525

Uh oh!

Conversation

hhzhang16 commented Aug 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

tedzhouhk commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devactivity-app bot commented Aug 29, 2025

Pull Request Summary by devActivity

Metrics

Achievements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hhzhang16 commented Aug 19, 2025 •

edited by coderabbitai bot

Loading