Skip to content

Conversation

@hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Aug 19, 2025

Overview:

This MR implements comprehensive testing infrastructure for SLA planner scaling behavior, enabling automated validation of prefill and decode worker scaling under varying load conditions. The test suite includes both unit tests for replica calculation logic and end-to-end Kubernetes deployment tests that validate scaling from 1P1D to 2P1D configurations.

Details:

  • End-to-end scaling tests with automated Kubernetes deployment; gradual approach (8→15→25 req/s)
  • Unit tests for replica calculation mathematical formulas and logic
  • load_generator.py includes reusable components for traffic simulation and performance analysis

What it looks like:

> ./run_scaling_test.sh --namespace hannahz --save-results

[INFO] SLA Planner Scaling Test
[INFO] Namespace: hannahz
[INFO] Scenario: Graduated 8->15->25 req/s (1P1D -> 2P1D prefill scaling, ISL=4000/OSL=150)
[INFO] Checking prerequisites...
[SUCCESS] Prerequisites check passed
[INFO] Checking for existing deployment...
[INFO] DynamoGraphDeployment vllm-disagg-planner already exists - skipping redeployment
[WARNING] Existing deployment is not ready (status: pending), will redeploy
[INFO] Deploying SLA planner...
dynamographdeployment.nvidia.com/vllm-disagg-planner configured
[SUCCESS] Deployment applied successfully
[INFO] Waiting for DynamoGraphDeployment to be processed...
dynamographdeployment.nvidia.com/vllm-disagg-planner condition met
[SUCCESS] DynamoGraphDeployment is ready
[INFO] Waiting for pods to be running (this may take several minutes for image pulls)...
[INFO] Waiting for frontend pod...
pod/vllm-disagg-planner-frontend-76f6696765-zmpx9 condition met
pod/vllm-disagg-planner-prometheus-74fcd6f557-whnf5 condition met
[SUCCESS] Frontend pod is ready
[INFO] Waiting for all pods to be running...
[INFO] Setting up port forwarding...
[INFO] Port forwarding to service: vllm-disagg-planner-frontend
[INFO] Waiting for port forwarding to be established...
[SUCCESS] Port forwarding established and service is healthy
[INFO] Running scaling test (graduated 8->15->25 req/s)...
[INFO] Results will be saved to tests/planner/e2e_scaling_results
2025-08-28 07:10:06,795 - INFO - Checking service availability at http://localhost:8000...
2025-08-28 07:10:06,795 - INFO - Running scaling test...
2025-08-28 07:10:06,795 - INFO - Starting scaling integration test
2025-08-28 07:10:07,739 - INFO - Test starting with: P=1, D=1, Total=2
2025-08-28 07:10:07,739 - INFO - Running hardcoded scaling scenario (12 req/s -> 24 req/s)
2025-08-28 07:10:07,739 - INFO - Starting graduated prefill scaling test scenario (targeting 1P1D -> 2P1D)
2025-08-28 07:10:07,739 - INFO - Using conservative graduated approach with metric generation
2025-08-28 07:10:07,739 - INFO - Saving results to: /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207
2025-08-28 07:10:07,739 - INFO - Starting phase1_baseline: 8.0 req/s for 90s
2025-08-28 07:10:07,739 - INFO - Generating load: 8.0 req/s for 90s
2025-08-28 07:10:07,739 - INFO - Using request_rate=8.0 req/s
2025-08-28 07:10:07,739 - INFO - Adjusted parameters: target_duration=60s, request_count=480
2025-08-28 07:10:07,740 - INFO - Running command: genai-perf profile --model nvidia/Llama-3.1-8B-Instruct-FP8 --tokenizer nvidia/Llama-3.1-8B-Instruct-FP8 --endpoint-type chat --url localhost:8000 --streaming --synthetic-input-tokens-mean 4000 --output-tokens-mean 150 --request-rate 8.0 --request-count 480 --goodput time_to_first_token:200 inter_token_latency:10 --stability-percentage 50 --num-dataset-entries 80 --artifact-dir /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/phase1_baseline -- -v -max-threads 64
2025-08-28 07:11:15,074 - INFO - Load generation completed successfully
2025-08-28 07:11:15,074 - INFO - Actual duration: 67.33s
2025-08-28 07:11:15,074 - WARNING - No JSON results found in artifact directory
2025-08-28 07:11:15,074 - INFO - Transition delay: 30s
2025-08-28 07:11:15,074 - INFO - Monitoring pod scaling for 480s (interval: 15s)
2025-08-28 07:11:16,024 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:11:31,992 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:11:45,088 - INFO - Starting phase2_moderate: 15.0 req/s for 120s
2025-08-28 07:11:45,088 - INFO - Generating load: 15.0 req/s for 120s
2025-08-28 07:11:45,088 - INFO - Using request_rate=15.0 req/s
2025-08-28 07:11:45,088 - INFO - Adjusted parameters: target_duration=60s, request_count=900
2025-08-28 07:11:45,088 - INFO - Running command: genai-perf profile --model nvidia/Llama-3.1-8B-Instruct-FP8 --tokenizer nvidia/Llama-3.1-8B-Instruct-FP8 --endpoint-type chat --url localhost:8000 --streaming --synthetic-input-tokens-mean 4000 --output-tokens-mean 150 --request-rate 15.0 --request-count 900 --goodput time_to_first_token:200 inter_token_latency:10 --stability-percentage 50 --num-dataset-entries 150 --artifact-dir /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/phase2_moderate -- -v -max-threads 64
2025-08-28 07:12:55,082 - INFO - Load generation completed successfully
2025-08-28 07:12:55,082 - INFO - Actual duration: 69.99s
2025-08-28 07:12:55,082 - WARNING - No JSON results found in artifact directory
2025-08-28 07:12:55,083 - INFO - Transition delay: 30s
2025-08-28 07:12:56,037 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:13:11,993 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:13:25,096 - INFO - Starting phase3_prefill_scaling_trigger: 25.0 req/s for 180s
2025-08-28 07:13:25,096 - INFO - Generating load: 25.0 req/s for 180s
2025-08-28 07:13:25,096 - INFO - Using request_rate=25.0 req/s
2025-08-28 07:13:25,096 - INFO - Adjusted parameters: target_duration=30s, request_count=750
2025-08-28 07:13:25,097 - INFO - Running command: genai-perf profile --model nvidia/Llama-3.1-8B-Instruct-FP8 --tokenizer nvidia/Llama-3.1-8B-Instruct-FP8 --endpoint-type chat --url localhost:8000 --streaming --synthetic-input-tokens-mean 4000 --output-tokens-mean 150 --request-rate 25.0 --request-count 750 --goodput time_to_first_token:200 inter_token_latency:10 --stability-percentage 50 --num-dataset-entries 250 --artifact-dir /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/phase3_prefill_scaling_trigger -- -v -max-threads 64
2025-08-28 07:14:20,468 - INFO - Load generation completed successfully
2025-08-28 07:14:20,468 - INFO - Actual duration: 55.37s
2025-08-28 07:14:20,468 - WARNING - No JSON results found in artifact directory
2025-08-28 07:14:20,468 - INFO - Graduated scaling test completed successfully
2025-08-28 07:14:20,468 - INFO - Test results saved to: /home/hannahz/dev/ai-dynamo/tests/planner/e2e_scaling_results/scaling_test_1756390207/scaling_test_results.json
2025-08-28 07:14:21,435 - INFO - Final pod counts: P=1, D=1, Total=2
2025-08-28 07:14:21,435 - INFO - Waiting for potential delayed scaling...
2025-08-28 07:14:22,434 - INFO - Pod counts: P=1, D=1, Total=2
2025-08-28 07:14:38,353 - INFO - Pod counts: P=2, D=1, Total=3
2025-08-28 07:14:54,314 - INFO - Pod counts: P=2, D=1, Total=3
2025-08-28 07:15:10,343 - INFO - Pod counts: P=2, D=1, Total=3
2025-08-28 07:15:22,347 - INFO - Final final pod counts: P=2, D=1, Total=3
2025-08-28 07:15:22,348 - INFO - ============================================================
2025-08-28 07:15:22,348 - INFO - TEST SUMMARY
2025-08-28 07:15:22,348 - INFO - ============================================================
2025-08-28 07:15:22,348 - INFO - ✅ Test PASSED: Successfully scaled from 1P1D to 2P1D
2025-08-28 07:15:22,348 - INFO - 
Detailed results saved to: /tmp/scaling_test_results_1756390522.json
2025-08-28 07:15:22,348 - INFO - ============================================================
[SUCCESS] Scaling test PASSED
[SUCCESS] Test completed successfully!
[INFO] Cleaning up port forwarding...

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Documentation

    • Reformatted a note for clarity.
    • Added planner dry-run and scaling test guides with usage, plots, scenarios, and prerequisites.
  • Tests

    • Added unit tests for replica calculation across edge cases and budgets.
    • Introduced an end-to-end scaling test with Kubernetes monitoring and health checks.
    • Provided a Kubernetes deployment manifest for the disaggregated planner.
    • Added a load generator utility and CLI for multi-phase scaling runs with results export.
    • Included a script to deploy, port-forward, run tests, and clean up.
    • Disabled verbose logging in planner tests.
  • Chores

    • Ignored test artifacts and temporary files.

@github-actions github-actions bot added the test label Aug 19, 2025
@hhzhang16 hhzhang16 self-assigned this Aug 21, 2025
@hhzhang16 hhzhang16 force-pushed the hannahz/dep-301-test-dynamo-planner-scaling-up branch from 7d3ebf9 to ccf304a Compare August 28, 2025 15:56
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
@tedzhouhk
Copy link
Contributor

can we put the files related to this test in a subfolder?

@hhzhang16 hhzhang16 marked this pull request as ready for review August 28, 2025 18:12
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

Walkthrough

Adds planner test infrastructure and docs: a Kubernetes DynamoGraphDeployment manifest for a disaggregated vLLM planner, an end-to-end scaling test (script and Python), a unit test suite for replica calculation, a load generator utility, local test fixtures, a .gitignore, and a minor docs formatting fix.

Changes

Cohort / File(s) Summary
Docs formatting
docs/architecture/sla_planner.md
Fixed blockquote formatting by replacing a stray "->" with ">"; content unchanged.
Planner tests gitignore
tests/planner/.gitignore
Ignored E2E artifacts, temp files, and Python caches.
Planner test fixture override
tests/planner/conftest.py
Added autouse logger fixture that no-ops to disable inherited logging in planner tests.
Disaggregated planner K8s manifest
tests/planner/disagg_planner.yaml
Added DynamoGraphDeployment with Frontend, Planner, Prometheus, VllmPrefillWorker, VllmDecodeWorker components, envs, probes, and commands.
E2E scaling test harness
tests/planner/run_scaling_test.sh, tests/planner/test_scaling_e2e.py
New shell script to deploy/check/port-forward and run the E2E test; Python test monitors K8s pod counts, drives two-phase load, validates 1P1D→2P1D scaling, and saves results.
Unit tests for replica calculation
tests/planner/test_replica_calculation.py
Added extensive pytest suite validating prefill/decode replica math under constraints, budgets, and multi-GPU configs using mocked planner internals.
Load generation utility
tests/planner/utils/load_generator.py, tests/planner/utils/__init__.py
Introduced LoadGenerator using genai-perf for multi-phase runs, result parsing, and optional persistence; added SPDX-only __init__.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as User
  participant Sh as run_scaling_test.sh
  participant K8s as Kubernetes API
  participant Svc as Frontend Service
  participant E2E as test_scaling_e2e.py
  participant LG as LoadGenerator

  Dev->>Sh: Invoke script (namespace, YAML, save-results)
  Sh->>K8s: Check/apply DynamoGraphDeployment
  Sh->>K8s: Wait for Ready (deployment, pods)
  Sh->>Svc: Port-forward :8000 -> :8000
  Sh->>Svc: /health poll until OK
  Sh->>E2E: Run E2E test with namespace/base URL
  par Monitor
    E2E->>K8s: Start pod-count monitor (prefill/decode)
  and Load
    E2E->>LG: Run phased load
    LG->>Svc: Send requests (genai-perf)
  end
  E2E->>E2E: Analyze counts (1P1D→2P1D), validate
  E2E-->>Sh: Exit code and results
  Sh->>K8s: Optional cleanup (delete deployment)
  Sh-->>Dev: Summary and return code
  note over E2E,K8s: Scaling observed via pod counts timeline
Loading
sequenceDiagram
  autonumber
  participant Client as genai-perf
  participant FE as Frontend
  participant Planner as SLA Planner
  participant Pref as VllmPrefillWorker
  participant Dec as VllmDecodeWorker
  participant Prom as Prometheus

  Client->>FE: Generate requests (RPS phases)
  FE->>Prom: Expose /metrics (ISL, OSL, TTFT, ITL, reqs)
  Planner->>Prom: Scrape metrics
  Planner->>Planner: Compute desired replicas
  Planner->>K8s: Set replicas Pref/Dec
  K8s-->>Pref: Scale prefill pods
  K8s-->>Dec: Scale decode pods
  Client-->>FE: Observe capacity change over phases
  note over Planner: Adjustment interval (e.g., 60s)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit taps a cluster’s door,
Pods hop from one to two, then more—
Prefill, decode, in tidy rows,
The planner counts, the traffic flows.
With scripts and charts we watch them scale,
Carrots cached in /tmp—no fail! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🧹 Nitpick comments (14)
tests/planner/.gitignore (1)

1-11: Broaden ignores to cover pytest/artifacts and local results.

Prevents accidental commits from local runs.

 # E2E test results - don't commit test artifacts to git
 e2e_scaling_results/
 
 # Temporary files
 *.tmp
 *.log
 
 # Python cache
 __pycache__/
 *.pyc
 *.pyo
+ 
+# Pytest artifacts
+.pytest_cache/
+.coverage
+.coverage.*
+
+# macOS cruft
+.DS_Store
+
+# Local load-generator result dirs
+scaling_test_*/
tests/planner/README.md (2)

169-173: Use a heading instead of emphasis (MD036).

-**E2E Test Deployment Management:**
+### E2E Test Deployment Management

183-184: Correct total duration estimate.

90 + 120 + 180 + 2×30 = 450s ≈ 7.5 minutes.

-- **Total test duration**: ~7 minutes + scaling observation
+- **Total test duration**: ~7.5 minutes + scaling observation
tests/planner/utils/load_generator.py (4)

21-25: Avoid configuring logging at import time.

Move basicConfig to CLI entrypoint to not override repo/test logging.

-logging.basicConfig(
-    level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
-)
 logger = logging.getLogger(__name__)

Add in main() before using logger:

# configure logging only for CLI
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

44-58: Docstring lists nonexistent parameters.

Remove stray args and clarify return type.

     def _calculate_genai_perf_params(
         self,
         req_per_sec: float,
     ) -> Dict[str, Any]:
         """
-        Calculate genai-perf parameters to approximate desired request rate.
-
-        Args:
-            req_per_sec: Desired requests per second
-            duration_sec: Test duration in seconds
-            estimated_request_duration: Estimated average request duration in seconds
-
-        Returns:
-            Dictionary with concurrency and request_rate parameters
-        """
+        Calculate genai-perf parameters to approximate the desired request rate.
+
+        Args:
+            req_per_sec: Desired requests per second.
+
+        Returns:
+            Dict with keys: {"concurrency": int, "request_rate": float}.
+        """

246-268: Docstring and phase description mismatch with e2e expectations.

Docstring says 5/10/18 rps; code uses 8/15/25. Align docstring or phases; keeping 8/15/25 is fine, just update text.

-        Uses a conservative graduated approach:
-        - Phase 1: 5 req/s (baseline, should work)
-        - Phase 2: 10 req/s (moderate load)
-        - Phase 3: 18 req/s (should trigger prefill scaling to 2P1D)
+        Uses a conservative graduated approach:
+        - Phase 1: 8 req/s (baseline)
+        - Phase 2: 15 req/s (moderate load)
+        - Phase 3: 25 req/s (should trigger prefill scaling to 2P1D)

266-312: Interface mismatch with e2e runner.

tests/planner/test_scaling_e2e.py expects two phases (12/24 rps) and keys "phase1"/"phase2", while this generator runs three phases (8/15/25) and returns "phase_results". Either adapt the e2e test to consume "phase_results" or expose a configurable phase list here with defaults matching the e2e expectations.

Would you like me to add a --phases JSON CLI arg and plumb it through so both 12/24 and 8/15/25 scenarios are supported?

tests/planner/conftest.py (1)

12-15: Consider documenting the test isolation pattern.

While disabling the parent logger fixture is a valid approach for planner tests, consider adding more details about why logging is disabled and what parent fixture is being overridden to help future maintainers understand this test isolation pattern.

 @pytest.fixture(autouse=True)
 def logger(request):
-    """Dummy logger fixture that does nothing - overrides the parent one."""
+    """
+    Dummy logger fixture that does nothing - overrides the parent autouse logger fixture.
+    
+    This prevents automatic test logging from the parent conftest.py, allowing planner
+    tests to run with cleaner output and without interference from the parent fixture's
+    logging configuration.
+    """
     yield
tests/planner/test_replica_calculation.py (3)

139-141: Avoid importing asyncio inside test methods.

Importing asyncio within test methods is unconventional and reduces readability.

Move the import to the top of the file:

+import asyncio
 import argparse
 import math
 import os

 # ... in the test method:
-        import asyncio
-
         asyncio.run(planner.make_adjustments())

150-151: Remove debug print statements in tests.

Debug print statements should be replaced with proper assertions or logging.

-            print(f"Expected prefill replicas: {expected_prefill_replicas}")
-            print(f"Calculated prefill replicas: {calculated_prefill_replicas}")
+            # Assert with a descriptive message
+            assert (
+                max(expected_prefill_replicas, planner.args.min_endpoint)
+                == calculated_prefill_replicas
+            ), f"Expected {max(expected_prefill_replicas, planner.args.min_endpoint)}, got {calculated_prefill_replicas}"

223-229: Test parametrization could be more descriptive.

The parametrized test cases would benefit from using pytest.param with IDs for better test output.

     @pytest.mark.parametrize(
         "num_req,decode_thpt,expected_p,expected_d",
         [
-            (10, 10000, 1, 1),  # low_load_10_req_per_second
-            (500, 1000, 1, 2),  # high_load_500_req_per_second (lower decode throughput)
+            pytest.param(10, 10000, 1, 1, id="low_load_10_req_per_second"),
+            pytest.param(500, 1000, 1, 2, id="high_load_500_req_per_second"),
         ],
     )
tests/planner/run_scaling_test.sh (1)

182-182: Remove unused loop variable.

The variable i in the loop is never used.

-    for i in {1..30}; do
+    for _ in {1..30}; do
tests/planner/disagg_planner.yaml (2)

13-13: Large inline configuration may be hard to maintain.

The DYNAMO_SERVICE_CONFIG environment variable contains a large inline JSON configuration that's difficult to read and maintain.

Consider using a ConfigMap for better maintainability:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: dynamo-service-config
data:
  config.json: |
    {
      "Prometheus": {
        "global": {
          "scrape_interval": "5s"
        },
        "scrape_configs": [
          {
            "job_name": "prometheus",
            "static_configs": [{"targets": ["localhost:9090"]}]
          },
          {
            "job_name": "frontend",
            "static_configs": [{"targets": ["vllm-disagg-planner-frontend:8000"]}]
          }
        ]
      }
    }

Then reference it in the deployment.


26-27: Frontend container command uses deprecated format.

Using args with a shell command string is not the recommended approach.

       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidian/nim-llm-dev/vllm-runtime:dep-301.6
-          args:
-            - "python3 -m dynamo.frontend --http-port 8000"
+          command:
+            - python3
+          args:
+            - -m
+            - dynamo.frontend
+            - --http-port
+            - "8000"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e3619ce and f284d77.

📒 Files selected for processing (10)
  • docs/architecture/sla_planner.md (1 hunks)
  • tests/planner/.gitignore (1 hunks)
  • tests/planner/README.md (2 hunks)
  • tests/planner/conftest.py (1 hunks)
  • tests/planner/disagg_planner.yaml (1 hunks)
  • tests/planner/run_scaling_test.sh (1 hunks)
  • tests/planner/test_replica_calculation.py (1 hunks)
  • tests/planner/test_scaling_e2e.py (1 hunks)
  • tests/planner/utils/__init__.py (1 hunks)
  • tests/planner/utils/load_generator.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/planner/run_scaling_test.sh (2)
tests/planner/test_scaling_e2e.py (1)
  • main (445-507)
tests/planner/utils/load_generator.py (1)
  • main (335-398)
tests/planner/utils/load_generator.py (2)
tests/planner/test_scaling_e2e.py (2)
  • run_scaling_test (267-342)
  • main (445-507)
tests/planner/run_scaling_test.sh (1)
  • main (235-302)
tests/planner/test_scaling_e2e.py (1)
tests/planner/utils/load_generator.py (3)
  • LoadGenerator (27-332)
  • run_scaling_test (246-332)
  • main (335-398)
tests/planner/test_replica_calculation.py (3)
components/planner/src/dynamo/planner/utils/planner_core.py (4)
  • Metrics (31-52)
  • Planner (55-525)
  • get_workers_info (129-178)
  • make_adjustments (331-388)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (3)
  • find_best_throughput_per_gpu (147-168)
  • interpolate_ttft (48-50)
  • interpolate_itl (137-139)
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
  • set_component_replicas (116-139)
🪛 Shellcheck (0.10.0)
tests/planner/run_scaling_test.sh

[warning] 101-101: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 166-166: Quote this to prevent word splitting.

(SC2046)


[warning] 182-182: i appears unused. Verify use (or export if used externally).

(SC2034)

🪛 LanguageTool
docs/architecture/sla_planner.md

[grammar] ~118-~118: There might be a mistake here.
Context: ...rts metrics at /metrics HTTP endpoint with number of requests, ISL, OSL, TTFT, ITL...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...nd provides these metrics automatically.

(QB_NEW_EN)

tests/planner/README.md

[grammar] ~145-~145: There might be a mistake here.
Context: ...## Quick Start #### Run Unit Tests Only Test the replica calculation logic witho...

(QB_NEW_EN)


[grammar] ~152-~152: There might be a mistake here.
Context: ...py -v ``` #### Run Full End-to-End Test Test complete scaling behavior including...

(QB_NEW_EN)


[grammar] ~169-~169: There might be a mistake here.
Context: ...s ``` E2E Test Deployment Management: - If no deployment exists: creates, tests,...

(QB_NEW_EN)


[grammar] ~170-~170: There might be a mistake here.
Context: ...ment exists: creates, tests, and cleans up deployment - If deployment exists: uses...

(QB_NEW_EN)


[grammar] ~178-~178: There might be a mistake here.
Context: ...eq/s for 90s (baseline - maintains 1P1D) - Phase 2: 15 req/s for 120s (moderate l...

(QB_NEW_EN)


[grammar] ~179-~179: There might be a mistake here.
Context: ...or 120s (moderate load - maintains 1P1D) - Phase 3: 25 req/s for 180s (scaling tr...

(QB_NEW_EN)


[grammar] ~180-~180: There might be a mistake here.
Context: ... 180s (scaling trigger - scales to 2P1D) - ISL/OSL: 4000/150 tokens (optimized fo...

(QB_NEW_EN)


[grammar] ~181-~181: There might be a mistake here.
Context: ...okens (optimized for prefill bottleneck) - Transition delay: 30s between phases -...

(QB_NEW_EN)


[grammar] ~182-~182: There might be a mistake here.
Context: ...Transition delay: 30s between phases - Total test duration: ~7 minutes + scal...

(QB_NEW_EN)


[grammar] ~183-~183: There might be a mistake here.
Context: ...tion**: ~7 minutes + scaling observation - Smart cleanup: Only removes deployment...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
tests/planner/README.md

174-174: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (2)
tests/planner/utils/__init__.py (1)

1-2: LGTM: package marker and SPDX headers.

tests/planner/utils/load_generator.py (1)

117-119: Maintain host:port only for --url. genai-perf’s --url flag expects a host:port (no http://); the existing .replace("http://", "") aligns with official examples (e.g. --url localhost:8001). (docs.nvidia.com)

Likely an incorrect or invalid review comment.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
@hhzhang16 hhzhang16 enabled auto-merge (squash) August 29, 2025 16:42
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
@hhzhang16 hhzhang16 merged commit 353ba5d into main Aug 29, 2025
10 checks passed
@hhzhang16 hhzhang16 deleted the hannahz/dep-301-test-dynamo-planner-scaling-up branch August 29, 2025 17:52
@devactivity-app
Copy link

Pull Request Summary by devActivity

Metrics

Cycle Time: 1d 1h 56m Coding Time: < 1 min Pickup Time: 9d 3h 4m Review Time: 23h 25m

Achievements

ayushag-nv pushed a commit that referenced this pull request Aug 29, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
michaelshin pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Michael Shin <michaelshin@users.noreply.github.com>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants