Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Aug 20, 2025

  • rename data_generator to prefix_data_generator (@PeaBrane tested CLI working)
  • move sin_synth.py to benchmark folder and add document
  • refactored sla planner argparse to reduce redundant code
  • fix pmdarima predictor predicting 0 with constant sequence (I think we need to look for better sequence predictor
  • add dryrun mode for sla planner
  • add doc on e2e planner dryrun with results (see tests/planner/README.md)

Summary by CodeRabbit

  • New Features
    • Added SLA planner dry-run mode with a CLI to simulate workloads, generate plots, and expose optional Prometheus metrics.
  • Bug Fixes
    • Improved load prediction robustness: handles constant inputs and safely falls back on errors.
    • Sinusoidal load generator adjusted to start at minimum values at t=0 for request rate and ISL/OSL ratio.
  • Refactor
    • Renamed data generator namespace across tooling for consistency.
  • Documentation
    • Added usage guide for the sinusoidal load generator.
    • Expanded planner docs with dry-run workflow and dataset generation examples.
  • Chores
    • Updated packaging and console script namespace; adjust installations accordingly.

@tedzhouhk tedzhouhk changed the title test/feat: add dryrun mode for sla planner test / feat: add dryrun mode for sla planner Aug 20, 2025
@tedzhouhk tedzhouhk changed the title test / feat: add dryrun mode for sla planner test: add dryrun mode for sla planner Aug 20, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

Walkthrough

Refactors benchmarks/data_generator to benchmarks/prefix_data_generator, updates packaging and CLI entry points, and adjusts imports accordingly. Adds a sinusoidal load generator with a phase-shifted rate and ratio model. Centralizes SLA planner CLI parsing, introduces planner dry-run mode and utilities, enhances load prediction robustness, and updates related documentation and tests.

Changes

Cohort / File(s) Summary of changes
Prefix data generator namespace refactor
benchmarks/prefix_data_generator/__init__.py, .../cli.py, .../example.py, .../graph_utils.py, .../prefix_analyzer.py, .../synthesizer.py, .../tests/test_hasher.py, .../tests/test_sampler.py, .../tests/test_synthesizer.py
Switched imports and public bindings from data_generator.* to prefix_data_generator.*; no logic changes.
Packaging and entry points
benchmarks/pyproject.toml
Updated package name, package-data, and console script entry from data_generator to prefix_data_generator (datagen -> prefix_data_generator.cli:main).
Sinusoidal load generator
benchmarks/sin_load_generator/README.md, benchmarks/sin_load_generator/sin_synth.py
Added README and modified sine formulas to include a -π/2 phase shift for request rate and ISL/OSL ratio; comments updated.
Planner CLI parser centralization
components/planner/src/dynamo/planner/planner_sla.py, components/planner/src/dynamo/planner/utils/argparse.py
Replaced inline argparse with create_sla_planner_parser(); added new parser builder exposing comprehensive CLI options.
Planner core dry-run and utilities
components/planner/src/dynamo/planner/utils/planner_core.py, components/planner/src/dynamo/planner/utils/load_predictor.py, components/planner/src/dynamo/planner/utils/trace_data_extractor.py, components/planner/test/planner_sla_dryrun.py
Introduced Planner dry-run mode, added predictors (ISL/OSL), lazy worker resolution, load prediction and replica computation, Prometheus gauges, Mooncake trace extraction utility, ARIMA robustness (constant-data guard and try/except fallback), and a dry-run CLI script.
Documentation updates
benchmarks/README.md, tests/planner/README.md
Fixed path reference to prefix_data_generator; added load dataset generation and planner dry-run documentation; adjusted example parameters and outputs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as planner_sla.py
  participant Parser as create_sla_planner_parser()
  participant Planner as Planner
  participant Runtime as DistributedRuntime
  participant Prom as Prometheus

  User->>CLI: python -m planner_sla ...
  CLI->>Parser: build parser
  Parser-->>CLI: ArgumentParser
  CLI->>CLI: args = parse_args()
  alt dry-run disabled
    CLI->>Runtime: init runtime/connectors
    CLI->>Planner: __init__(runtime, args, dryrun=false)
    Planner->>Prom: start metrics server (if port)
  else dry-run enabled
    CLI->>Planner: __init__(None, args, dryrun=true)
    Note right of Planner: Skips runtime/connectors\nSkips Prometheus startup
  end
  CLI->>Planner: run loop
Loading
sequenceDiagram
  autonumber
  participant DryRun as planner_sla_dryrun.py
  participant Parser as create_sla_planner_parser()
  participant Planner
  participant Extract as extract_metrics_from_mooncake()
  participant Pred as Predictors (num_req/isl/osl)
  participant Plot as Matplotlib

  DryRun->>Parser: build + extend parser
  DryRun->>DryRun: parse args
  DryRun->>Planner: __init__(None, args, dryrun=true)
  DryRun->>Planner: dryrun_run()
  Planner->>Extract: load and bucket metrics
  Extract-->>Planner: interval metrics
  loop for each interval
    Planner->>Pred: update with observed metrics
    Planner->>Planner: predict_load()
    Planner->>Planner: _compute_replica_requirements()
  end
  Planner->>Plot: render 2x2 plots
  Plot-->>Planner: save figure
Loading
sequenceDiagram
  autonumber
  participant LP as load_predictor.py
  participant ARIMA as ARIMA(model)
  participant Log as Logger

  LP->>LP: is_constant(series)?
  alt constant
    LP-->>LP: return last value
  else non-constant
    LP->>ARIMA: fit & forecast (try)
    alt success
      ARIMA-->>LP: next value
    else exception
      ARIMA-->>LP: raise
      LP->>Log: warn and fallback
      LP-->>LP: return last value
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I thump my paws at shifting names,
From data to prefix, tidy games.
A sinus wave with phase awry,
Starts at low, then climbs the sky.
I dry-run plans with gentle grace—
Plot carrots, nodes, and token pace.
Hop-hop, release in place! 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (19)
benchmarks/README.md (1)

31-31: Grammar fix: “information are” → “information is”.

Minor copy edit to improve readability.

Apply this diff:

-Detailed information are provided in the `prefix_data_generator` directory.
+Detailed information is provided in the `prefix_data_generator` directory.
benchmarks/prefix_data_generator/tests/test_hasher.py (1)

48-51: Avoid CI flakiness from external model downloads (optional).

The deepseek_tokenizer fixture downloads a HF model at test time. Consider marking this test as slow and/or skipping when offline to keep CI stable.

You could adapt like:

import os
import pytest

@pytest.fixture(scope="module")
def deepseek_tokenizer():
    if os.environ.get("NO_NETWORK") == "1":
        pytest.skip("requires network to download tokenizer")
    return AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-1.3b-base")
components/planner/src/dynamo/planner/utils/load_predictor.py (3)

92-111: Robust ARIMA predictor: constant-series guard and safe fallback are good additions.

  • Returning the constant value for flat series avoids pmdarima’s degenerate behavior.
  • Wrapping fit/predict with a try/except and falling back to the last value prevents runtime failures from propagating.

Two optional improvements:

  • Clamp negative forecasts to zero to avoid passing negative loads downstream.
  • Add unit tests covering: (a) constant series, (b) ARIMA exception path, (c) normal varying series.

Example minimal clamp:

-            return forecast[0]
+            return max(0, float(forecast[0]))

26-34: Logger is bound to “cmdstanpy”, so your warnings won’t surface. Use a module logger and isolate cmdstanpy suppression.

Currently, logger = logging.getLogger("cmdstanpy") and propagate=False will sink your ARIMA warnings. Keep the cmdstanpy logger muted, but emit module logs via logging.getLogger(__name__).

Apply this diff:

-logger = logging.getLogger("cmdstanpy")
-logger.addHandler(logging.NullHandler())
-logger.propagate = False
-logger.setLevel(logging.CRITICAL)
+module_logger = logging.getLogger(__name__)
+cmdstanpy_logger = logging.getLogger("cmdstanpy")
+cmdstanpy_logger.addHandler(logging.NullHandler())
+cmdstanpy_logger.propagate = False
+cmdstanpy_logger.setLevel(logging.CRITICAL)

And update the warning site:

-            logger.warning(f"ARIMA prediction failed: {e}, using last value")
+            module_logger.warning(f"ARIMA prediction failed: {e}, using last value")

Also applies to: 109-111


31-34: Consider scoping warning suppression.

Global warnings.filterwarnings("ignore", ...) can hide useful signals across the process. Prefer scoping (context manager) around Prophet fit if that’s the primary source of noise.

Example:

with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=FutureWarning)
    warnings.filterwarnings("ignore", category=UserWarning)
    model.fit(df)
benchmarks/sin_load_generator/sin_synth.py (2)

100-106: Typo: “poison distribution” → “Poisson distribution”; keep docs consistent with phase shift.

Small doc fix and clarity.

Apply this diff:

-    # for the process interval at [t, t + process_interval), the number of requests to generate is sampled
-    # from a poison distribution with the following parameters:
-    # request_rate(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * pi / period * t - pi / 2)
-    # the phase shift is pi / 2 to make the request rate start from the minimum at t = 0
+    # For the process interval [t, t + process_interval), the number of requests is sampled
+    # from a Poisson distribution with:
+    #   request_rate(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * pi / period * t - pi / 2)
+    # The phase shift of pi/2 makes the request rate start from the minimum at t = 0.

31-36: Optional: guard isl_osl_ratio into [0, 1].

If users set min/max outside [0,1], the ratio becomes an invalid probability. A light clamp keeps sampling sane.

For example:

-        isl_osl_ratio = (args.isl_osl_ratio_min + args.isl_osl_ratio_max) / 2 + (
-            args.isl_osl_ratio_max - args.isl_osl_ratio_min
-        ) / 2 * np.sin(2 * np.pi / args.isl_osl_ratio_period * t - np.pi / 2)
+        isl_osl_ratio = (args.isl_osl_ratio_min + args.isl_osl_ratio_max) / 2 + (
+            args.isl_osl_ratio_max - args.isl_osl_ratio_min
+        ) / 2 * np.sin(2 * np.pi / args.isl_osl_ratio_period * t - np.pi / 2)
+        isl_osl_ratio = float(np.clip(isl_osl_ratio, 0.0, 1.0))

Also applies to: 126-131

components/planner/src/dynamo/planner/utils/trace_data_extractor.py (1)

26-33: Stream processing to avoid loading entire file; improve resilience to bad lines

For large trace files, building an in-memory records list is unnecessary and can be memory-heavy. You can stream-accumulate per-interval stats and tolerate occasional malformed lines.

Apply this refactor:

-    # Read and parse JSONL data from file
-    records = []
-    with open(dataset, "r", encoding="utf-8") as f:
-        for line in f:
-            if line.strip():
-                records.append(json.loads(line))
-
-    # Group records by adjustment interval
-    interval_groups = defaultdict(list)
-
-    for record in records:
-        ts = record["timestamp"]
-        # Convert to seconds depending on input units and find the interval
-        timestamp_sec = (ts / 1000.0) if timestamps_in_ms else float(ts)
-        interval_start = int(timestamp_sec // adjustment_interval) * adjustment_interval
-        interval_groups[interval_start].append(record)
+    # Accumulate per-interval counters on the fly
+    interval_groups: dict[int, dict[str, float]] = defaultdict(lambda: {"cnt": 0, "isl": 0.0, "osl": 0.0})
+
+    with open(dataset, "r", encoding="utf-8") as f:
+        for line in f:
+            if not line.strip():
+                continue
+            try:
+                record = json.loads(line)
+                ts = record["timestamp"]
+                timestamp_sec = (ts / 1000.0) if timestamps_in_ms else float(ts)
+                interval_start = int(timestamp_sec // adjustment_interval) * adjustment_interval
+
+                isl = float(record["input_length"])
+                osl = float(record["output_length"])
+
+                agg = interval_groups[interval_start]
+                agg["cnt"] += 1
+                agg["isl"] += isl
+                agg["osl"] += osl
+            except (json.JSONDecodeError, KeyError, TypeError, ValueError):
+                # Skip malformed lines or missing fields
+                continue
@@
-    # Compute metrics for each interval
-    metrics = []
-
-    for interval_start in sorted(interval_groups.keys()):
-        records_in_interval = interval_groups[interval_start]
-
-        # Calculate metrics
-        request_count = len(records_in_interval)
-
-        # Calculate average ISL and OSL
-        total_isl = sum(record["input_length"] for record in records_in_interval)
-        total_osl = sum(record["output_length"] for record in records_in_interval)
-
-        avg_isl = total_isl / request_count if request_count > 0 else 0
-        avg_osl = total_osl / request_count if request_count > 0 else 0
-
-        metrics.append(
-            {
-                "interval_start": interval_start,
-                "request_count": request_count,
-                "avg_isl": avg_isl,
-                "avg_osl": avg_osl,
-            }
-        )
+    # Compute metrics for each interval
+    metrics: List[Dict[str, Any]] = []
+    for interval_start in sorted(interval_groups.keys()):
+        agg = interval_groups[interval_start]
+        cnt = int(agg["cnt"])
+        avg_isl = (agg["isl"] / cnt) if cnt else 0.0
+        avg_osl = (agg["osl"] / cnt) if cnt else 0.0
+        metrics.append(
+            {
+                "interval_start": int(interval_start),
+                "request_count": cnt,
+                "avg_isl": avg_isl,
+                "avg_osl": avg_osl,
+            }
+        )

Also applies to: 33-45, 46-68

benchmarks/sin_load_generator/README.md (5)

39-41: Add language to fenced code block (markdownlint MD040)

Explicitly annotate fenced code blocks to satisfy markdownlint and improve readability.

Apply:

-```
+```text
 request_rate(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * π / period * t - π / 2)

---

`59-63`: **Add language to fenced code block (markdownlint MD040)**

Same fix for the ISL/OSL ratio formula block.


```diff
-```
+```text
 isl-osl-ratio(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * π / period * t - π / 2)

---

`8-8`: **Consistent capitalization for JSONL**

Use uppercase JSONL consistently.


```diff
-`sin_synth.py` is a simple script to generate synthetic load with sinusoidal request rate and isl/osl ratio. The output is in [mooncake-style](https://github.com/kvcache-ai/Mooncake) jsonl format, which can be directly used in [GenAI-Perf](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf/genai_perf).
+`sin_synth.py` is a simple script to generate synthetic load with sinusoidal request rate and isl/osl ratio. The output is in [mooncake-style](https://github.com/kvcache-ai/Mooncake) JSONL format, which can be directly used in [GenAI-Perf](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf/genai_perf).

56-58: Fix capitalization and phrasing for ISL/OSL ratio explanation

Minor clarity and capitalization fixes.

-The script will generate load with requests sampled from two preset ISL/OSL combinations.
-The ISL/OSL ratio defines how much of requests follow the first preset ISL/OSL pattern. ISl/OSL 0 means all requests follow the first preset ISL/OSL pattern, while ISL/OSL 1 means all requests follow the second preset ISL/OSL pattern.
+The script will generate load with requests sampled from two preset ISL/OSL combinations.
+The ISL/OSL ratio defines the fraction of requests that follow the first preset ISL/OSL pattern. ISL/OSL ratio 0 means all requests follow the first preset pattern, while ISL/OSL ratio 1 means all requests follow the second preset pattern.

123-123: Fix spacing/typo in closing example sentence

Adds a missing space for readability.

-This generates a 60-second dataset with request rate fixed at 5 requests/second, with ISL/OSL ratio varying between 0.2 and 0.8 between I3000O150 and I500O2000over a 20-second period.
+This generates a 60-second dataset with request rate fixed at 5 requests/second, with ISL/OSL ratio varying between 0.2 and 0.8 between I3000O150 and I500O2000 over a 20-second period.
benchmarks/prefix_data_generator/tests/test_synthesizer.py (1)

22-22: Import path update aligns with the namespace refactor

The new import from prefix_data_generator.synthesizer is consistent with the package rename. No issues.

Optional: seed randomness to keep tests deterministic across runs.

 import random
@@
+random.seed(0)
components/planner/test/planner_sla_dryrun.py (1)

21-22: Optional: configure basic logging or remove unused logger.

logger is declared but never used here. Either configure and use it (e.g., logging.basicConfig(level=logging.INFO)) or remove it for clarity.

benchmarks/pyproject.toml (1)

62-65: Ensure all subpackages are included in the wheel; consider using setuptools package discovery.

Specifying packages = ["prefix_data_generator"] may exclude subpackages unless you list them all. Prefer discovery to include prefix_data_generator.*. Also, including **/*.py as package data is unnecessary; Python files are already packaged.

Apply this diff:

-[tool.setuptools]
-packages = ["prefix_data_generator"]
-
-[tool.setuptools.package-data]
-prefix_data_generator = ["**/*.py"]
+[tool.setuptools]
+packages = { find = { include = ["prefix_data_generator*"] } }
+
+# Only include non-Python assets here if needed (examples):
+# [tool.setuptools.package-data]
+# prefix_data_generator = ["py.typed", "data/**/*"]

Additionally, consider renaming the distribution for clarity:

  • Current: [project].name = "data-generator"
  • Optional: "prefix-data-generator" to align with the new import namespace.

Would you like me to propagate this rename across docs and any packaging references?

components/planner/src/dynamo/planner/utils/planner_core.py (3)

433-446: Avoid shadowing the time module; rename the x-axis list.

time = [0] shadows the imported time module within this scope. Rename to avoid confusion.

Apply this diff:

-        time = [0]
+        t_axis = [0]
@@
-        est_rr = [metrics[0]["request_count"]]
+        est_rr = [metrics[0]["request_count"]]
@@
-        # update time
-            time.append(time[-1] + self.args.adjustment_interval)
+            # update time
+            t_axis.append(t_axis[-1] + self.args.adjustment_interval)
@@
-        ax1.plot(time, rr, "b-", label="Actual Request Rate", linewidth=2)
-        ax1.plot(time, est_rr, "r--", label="Predicted Request Rate", linewidth=2)
+        ax1.plot(t_axis, rr, "b-", label="Actual Request Rate", linewidth=2)
+        ax1.plot(t_axis, est_rr, "r--", label="Predicted Request Rate", linewidth=2)
@@
-        ax2.plot(time, isl, "g-", label="Actual ISL", linewidth=2)
-        ax2.plot(time, est_isl, "g--", label="Predicted ISL", linewidth=2)
-        ax2.plot(time, osl, "m-", label="Actual OSL", linewidth=2)
-        ax2.plot(time, est_osl, "m--", label="Predicted OSL", linewidth=2)
+        ax2.plot(t_axis, isl, "g-", label="Actual ISL", linewidth=2)
+        ax2.plot(t_axis, est_isl, "g--", label="Predicted ISL", linewidth=2)
+        ax2.plot(t_axis, osl, "m-", label="Actual OSL", linewidth=2)
+        ax2.plot(t_axis, est_osl, "m--", label="Predicted OSL", linewidth=2)
@@
-        ax3.plot(time, p_thpt, "b-", label="Actual Prefill Throughput", linewidth=2)
+        ax3.plot(t_axis, p_thpt, "b-", label="Actual Prefill Throughput", linewidth=2)
@@
-            time, safe_p_thpt, "b--", label="Safe Prefill Throughput Limit", linewidth=2
+            t_axis, safe_p_thpt, "b--", label="Safe Prefill Throughput Limit", linewidth=2
@@
-            time, num_p, "c-", label="Prefill Workers", linewidth=2, marker="o"
+            t_axis, num_p, "c-", label="Prefill Workers", linewidth=2, marker="o"
@@
-        ax4.plot(time, d_thpt, "r-", label="Actual Decode Throughput", linewidth=2)
+        ax4.plot(t_axis, d_thpt, "r-", label="Actual Decode Throughput", linewidth=2)
@@
-            time, safe_d_thpt, "r--", label="Safe Decode Throughput Limit", linewidth=2
+            t_axis, safe_d_thpt, "r--", label="Safe Decode Throughput Limit", linewidth=2
@@
-            time, num_d, "orange", label="Decode Workers", linewidth=2, marker="o"
+            t_axis, num_d, "orange", label="Decode Workers", linewidth=2, marker="o"

511-515: Clarify y-axis units for request plot.

You plot counts per adjustment interval; reflect that in the label for clarity.

Apply this diff:

-        ax1.set_ylabel("Request Rate")
+        ax1.set_ylabel("Requests per interval")

571-576: Create output directory before saving the figure.

Saving will fail if the parent directory doesn’t exist.

Apply this diff:

-        plt.tight_layout()
-        plt.savefig(self.args.output_plot, dpi=300, bbox_inches="tight")
+        plt.tight_layout()
+        from pathlib import Path
+        out_path = Path(self.args.output_plot)
+        out_path.parent.mkdir(parents=True, exist_ok=True)
+        plt.savefig(out_path, dpi=300, bbox_inches="tight")
         plt.close()
 
-        logger.info(f"Dryrun plot saved to {self.args.output_plot}")
+        logger.info(f"Dryrun plot saved to {out_path}")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5905335 and eee65af.

⛔ Files ignored due to path filters (1)
  • tests/planner/figures/dryrun_plot.png is excluded by !**/*.png
📒 Files selected for processing (20)
  • benchmarks/README.md (1 hunks)
  • benchmarks/prefix_data_generator/__init__.py (1 hunks)
  • benchmarks/prefix_data_generator/cli.py (1 hunks)
  • benchmarks/prefix_data_generator/example.py (1 hunks)
  • benchmarks/prefix_data_generator/graph_utils.py (1 hunks)
  • benchmarks/prefix_data_generator/prefix_analyzer.py (1 hunks)
  • benchmarks/prefix_data_generator/synthesizer.py (2 hunks)
  • benchmarks/prefix_data_generator/tests/test_hasher.py (1 hunks)
  • benchmarks/prefix_data_generator/tests/test_sampler.py (1 hunks)
  • benchmarks/prefix_data_generator/tests/test_synthesizer.py (1 hunks)
  • benchmarks/pyproject.toml (2 hunks)
  • benchmarks/sin_load_generator/README.md (1 hunks)
  • benchmarks/sin_load_generator/sin_synth.py (4 hunks)
  • components/planner/src/dynamo/planner/planner_sla.py (2 hunks)
  • components/planner/src/dynamo/planner/utils/argparse.py (1 hunks)
  • components/planner/src/dynamo/planner/utils/load_predictor.py (1 hunks)
  • components/planner/src/dynamo/planner/utils/planner_core.py (6 hunks)
  • components/planner/src/dynamo/planner/utils/trace_data_extractor.py (1 hunks)
  • components/planner/test/planner_sla_dryrun.py (1 hunks)
  • tests/planner/README.md (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (14)
benchmarks/prefix_data_generator/tests/test_sampler.py (1)
benchmarks/prefix_data_generator/sampler.py (1)
  • EmpiricalSampler (50-69)
benchmarks/prefix_data_generator/tests/test_synthesizer.py (1)
benchmarks/prefix_data_generator/synthesizer.py (1)
  • Synthesizer (34-330)
benchmarks/prefix_data_generator/tests/test_hasher.py (1)
benchmarks/prefix_data_generator/hasher.py (2)
  • hashes_to_texts (96-166)
  • texts_to_hashes (33-93)
components/planner/src/dynamo/planner/utils/argparse.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
  • SLAPlannerDefaults (72-83)
benchmarks/prefix_data_generator/graph_utils.py (1)
benchmarks/prefix_data_generator/sampler.py (1)
  • get_cdf (26-28)
components/planner/test/planner_sla_dryrun.py (2)
components/planner/src/dynamo/planner/utils/argparse.py (1)
  • create_sla_planner_parser (21-122)
components/planner/src/dynamo/planner/utils/planner_core.py (2)
  • Planner (55-575)
  • dryrun_run (407-575)
benchmarks/prefix_data_generator/synthesizer.py (3)
benchmarks/prefix_data_generator/graph_utils.py (5)
  • _mark_visited (34-43)
  • _merge_chains (46-111)
  • _precompute_transition_cdfs (142-155)
  • _remove_leaves (114-139)
  • _verify_tree (22-31)
benchmarks/prefix_data_generator/sampler.py (2)
  • EmpiricalSampler (50-69)
  • sample_from_cdf (38-47)
benchmarks/prefix_data_generator/logging_utils.py (1)
  • calculate_and_print_statistics (23-55)
benchmarks/prefix_data_generator/prefix_analyzer.py (1)
benchmarks/prefix_data_generator/logging_utils.py (1)
  • calculate_and_print_statistics (23-55)
benchmarks/prefix_data_generator/example.py (2)
benchmarks/prefix_data_generator/hasher.py (1)
  • hashes_to_texts (96-166)
benchmarks/prefix_data_generator/synthesizer.py (1)
  • Synthesizer (34-330)
benchmarks/prefix_data_generator/__init__.py (1)
benchmarks/prefix_data_generator/cli.py (1)
  • main (20-48)
benchmarks/prefix_data_generator/cli.py (3)
benchmarks/prefix_data_generator/__init__.py (1)
  • main (19-20)
benchmarks/prefix_data_generator/prefix_analyzer.py (1)
  • main (156-183)
benchmarks/prefix_data_generator/synthesizer.py (1)
  • main (333-438)
components/planner/src/dynamo/planner/utils/load_predictor.py (1)
tests/conftest.py (1)
  • logger (124-135)
components/planner/src/dynamo/planner/planner_sla.py (1)
components/planner/src/dynamo/planner/utils/argparse.py (1)
  • create_sla_planner_parser (21-122)
components/planner/src/dynamo/planner/utils/planner_core.py (4)
components/planner/src/dynamo/planner/utils/trace_data_extractor.py (1)
  • extract_metrics_from_mooncake (9-68)
components/planner/src/dynamo/planner/defaults.py (1)
  • SLAPlannerDefaults (72-83)
components/planner/src/dynamo/planner/utils/load_predictor.py (4)
  • predict_next (57-59)
  • predict_next (70-71)
  • predict_next (87-111)
  • predict_next (142-162)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (4)
  • interpolate_thpt_per_gpu (52-54)
  • interpolate_thpt_per_gpu (141-145)
  • find_best_throughput_per_gpu (147-168)
  • interpolate_ttft (48-50)
🪛 LanguageTool
benchmarks/sin_load_generator/README.md

[grammar] ~19-~19: There might be a mistake here.
Context: ...ons - --block-size INT (default: 512) - Block size for hashing, since there is n...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ... - --total-blocks INT (default: 10000) - ISL prompt blocks are randomly sampled f...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...tput-file STR` (default: auto-generated) - Output file name (in jsonl format) - I...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...) - Output file name (in jsonl format) - If not specified, the script will genera...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ... - --time-duration INT (default: 100) - Total time duration of the dataset in se...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ... - --process-interval INT (default: 1) - Sampling interval used to generate the d...

(QB_NEW_EN)


[grammar] ~45-~45: There might be a mistake here.
Context: ... --request-rate-min FLOAT (default: 5) - Minimum request rate in requests per sec...

(QB_NEW_EN)


[grammar] ~48-~48: There might be a mistake here.
Context: ...--request-rate-max FLOAT (default: 10) - Maximum request rate in requests per sec...

(QB_NEW_EN)


[grammar] ~51-~51: There might be a mistake here.
Context: ...request-rate-period FLOAT` (default: 10) - Period of the sinusoidal request rate in...

(QB_NEW_EN)


[grammar] ~66-~66: There might be a mistake here.
Context: ... t = 0. - --isl1 INT (default: 100) - Minimum input sequence length - `--osl1...

(QB_NEW_EN)


[grammar] ~69-~69: There might be a mistake here.
Context: ...e length - --osl1 INT (default: 2000) - Minimum output sequence length - `--isl...

(QB_NEW_EN)


[grammar] ~72-~72: There might be a mistake here.
Context: ...e length - --isl2 INT (default: 5000) - Maximum input sequence length - `--osl2...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...ce length - --osl2 INT (default: 100) - Maximum output sequence length - `--isl...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...-isl-osl-ratio-min FLOAT` (default: 0.2) - Minimum ratio of input sequence length t...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...-isl-osl-ratio-max FLOAT` (default: 0.8) - Maximum ratio of input sequence length t...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...sl-osl-ratio-period FLOAT` (default: 10) - Period of the sinusoidal input/output se...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...``` This generates a 60-second dataset with request rate fixed at 5 requests/second...

(QB_NEW_EN)

tests/planner/README.md

[grammar] ~69-~69: There might be a mistake here.
Context: ...quest/s/gpu for both prefill and decode. To test planner's performance for differ...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...t/s/gpu for both prefill and decode. To test planner's performance for different req...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...rate varying between 20 to 80 request/s. For TP1 H200 engine, planner should scal...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ... 20 to 80 request/s. For TP1 H200 engine, planner should scale between 1P1D and 3...

(QB_NEW_EN)


[grammar] ~86-~86: There might be a mistake here.
Context: ...k to 20 request/s at t=600s, and repeat. The total duration is 30 minutes or 1800...

(QB_NEW_EN)


[grammar] ~91-~91: There might be a mistake here.
Context: ...00 seconds. ## Planner Dry Run Before testing SLA planner on real deployments, we pro...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...equests can be processed within the SLA. - The correction factor will be disabled b...

(QB_NEW_EN)


[grammar] ~107-~107: There might be a mistake here.
Context: ...o_output_plot> ``` For example, to dry run SLA planner for the previous FP8 8B on ...

(QB_NEW_EN)


[grammar] ~121-~121: Ensure spelling is correct
Context: ...-plot dryrun_plot.png ``` Below is the dryrun result: ![Dryrun Plot](./figures/dryru...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~129-~129: There might be a mistake here.
Context: ...plot shows the actual prefill throughput, number of prefill workers that planner ...

(QB_NEW_EN)


[grammar] ~129-~129: There might be a mistake here.
Context: ...l throughput, number of prefill workers that planner scales, and the safe throughput...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...plot, shows the actual decode throughput, number of decode workers that planner s...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...de throughput, number of decode workers that planner scales, and the safe throughput...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...ctual deployment can adhere the ITL SLA.

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
benchmarks/sin_load_generator/README.md

39-39: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


60-60: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (23)
benchmarks/README.md (1)

28-31: CLI command datagen is correctly configured
Verified that in benchmarks/pyproject.toml under [project.scripts] the entry

datagen = "prefix_data_generator.cli:main"

remains intact. No updates to the README are needed.

benchmarks/prefix_data_generator/tests/test_hasher.py (1)

20-20: Import path update looks correct.

The switch to prefix_data_generator.hasher aligns with the module rename.

benchmarks/prefix_data_generator/tests/test_sampler.py (1)

19-19: Import path update LGTM.

Matches the new namespace; no API changes required.

benchmarks/sin_load_generator/sin_synth.py (2)

31-36: Phase shift for isl/osl ratio looks correct.

Using sin(... - pi/2) starts at the minimum at t=0 as documented.


44-47: Phase shift for request_rate is consistent with docs.

Same offset applied; initial request rate starts at the configured minimum.

benchmarks/prefix_data_generator/prefix_analyzer.py (1)

19-19: LGTM: import path updated to new module namespace

The switch to prefix_data_generator.logging_utils matches the refactor and keeps usage unchanged.

benchmarks/prefix_data_generator/__init__.py (1)

16-16: LGTM: public CLI alias now targets the renamed package

The cli_main import points to prefix_data_generator.cli.main. Matches CLI changes elsewhere.

benchmarks/prefix_data_generator/graph_utils.py (1)

18-19: Import Refactor Verified

  • No lingering data_generator imports found.
  • get_cdf is defined in benchmarks/prefix_data_generator/sampler.py.
  • CACHE_END, END_NODE, and SUPER_ROOT are defined in benchmarks/prefix_data_generator/protocols.py.

Imports are correctly updated under prefix_data_generator with no functional changes.

benchmarks/prefix_data_generator/example.py (1)

20-21: Updated imports to prefix_data_generator — LGTM.

The example now targets the renamed package; no behavior change.

benchmarks/prefix_data_generator/synthesizer.py (2)

23-31: Namespace refactor for internal imports — looks good.

All internal imports now reference prefix_data_generator.* consistently; matches the package rename.


337-337: Adjusted logging_utils import path — OK.

calculate_and_print_statistics is correctly sourced from prefix_data_generator.logging_utils.

components/planner/src/dynamo/planner/planner_sla.py (1)

22-22: Centralized SLA parser import — good consolidation.

Shifting to create_sla_planner_parser reduces duplication and keeps CLI options consistent.

benchmarks/prefix_data_generator/cli.py (2)

39-39: Refactored analyze import to new namespace — OK.

Dynamic import with argv forwarding remains unchanged; works for the renamed module.


45-45: Refactored synthesize import to new namespace — OK.

Consistent with the package rename; no functional differences.

components/planner/src/dynamo/planner/utils/argparse.py (1)

21-122: Centralizing SLA planner argparse is solid.

The option set and defaults look consistent with SLAPlannerDefaults. Clear separation from CLI scripts is good for reuse.

benchmarks/pyproject.toml (1)

52-52: Script entrypoint update looks correct.

datagen now points to prefix_data_generator.cli:main, matching the namespace rename.

tests/planner/README.md (2)

47-63: Interpolator example updates look coherent.

The OSL and computed values are consistent with the narrative and updated parameters.


98-105: CLI shows --dry-run flag, but the script didn’t accept it.

Documentation is correct conceptually; the script needs to accept --dry-run to avoid argparse errors. See my suggested code change in components/planner/test/planner_sla_dryrun.py to add the flag and wire it through. Once merged, this section remains valid.

Do you want me to open a doc-task issue to track this coupling between CLI and docs?

components/planner/src/dynamo/planner/utils/planner_core.py (5)

22-22: Import for trace extractor is correct.


56-78: Dry-run-aware initialization looks good.

Skipping runtime/connector/Prometheus initialization in dry-run is appropriate; production path remains intact.


101-121: Prometheus metrics initialization only in non-dry-run is appropriate.

Gauges and server lifecycle are guarded by prom port; error handling is in place.


122-128: Correction-factor defaults in dry-run are sensible.

Disabling correction in dry-run avoids using unavailable SLA metrics.


256-270: Double-check the prefill correction-factor application.

You multiply the predicted prefill load per GPU by min(1, p_correction_factor). If p_correction_factor < 1 represents additional queuing slowing TTFT, this reduces the estimated load and thus reduces required replicas, which seems opposite to compensating for queue-induced TTFT inflation. If the intent is to scale replicas up when p_correction_factor < 1, consider dividing effective throughput by the factor or multiplying the load by 1 / p_correction_factor (with caps).

Would you like me to propose a concrete adjustment once the intended semantics of p_correction_factor are confirmed?

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
@tedzhouhk tedzhouhk merged commit 1b3a107 into main Aug 20, 2025
14 of 15 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/planner-datagen branch August 20, 2025 21:05
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
nv-anants pushed a commit that referenced this pull request Aug 28, 2025
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants