-
Notifications
You must be signed in to change notification settings - Fork 676
test: add dryrun mode for sla planner #2557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors benchmarks/data_generator to benchmarks/prefix_data_generator, updates packaging and CLI entry points, and adjusts imports accordingly. Adds a sinusoidal load generator with a phase-shifted rate and ratio model. Centralizes SLA planner CLI parsing, introduces planner dry-run mode and utilities, enhances load prediction robustness, and updates related documentation and tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant CLI as planner_sla.py
participant Parser as create_sla_planner_parser()
participant Planner as Planner
participant Runtime as DistributedRuntime
participant Prom as Prometheus
User->>CLI: python -m planner_sla ...
CLI->>Parser: build parser
Parser-->>CLI: ArgumentParser
CLI->>CLI: args = parse_args()
alt dry-run disabled
CLI->>Runtime: init runtime/connectors
CLI->>Planner: __init__(runtime, args, dryrun=false)
Planner->>Prom: start metrics server (if port)
else dry-run enabled
CLI->>Planner: __init__(None, args, dryrun=true)
Note right of Planner: Skips runtime/connectors\nSkips Prometheus startup
end
CLI->>Planner: run loop
sequenceDiagram
autonumber
participant DryRun as planner_sla_dryrun.py
participant Parser as create_sla_planner_parser()
participant Planner
participant Extract as extract_metrics_from_mooncake()
participant Pred as Predictors (num_req/isl/osl)
participant Plot as Matplotlib
DryRun->>Parser: build + extend parser
DryRun->>DryRun: parse args
DryRun->>Planner: __init__(None, args, dryrun=true)
DryRun->>Planner: dryrun_run()
Planner->>Extract: load and bucket metrics
Extract-->>Planner: interval metrics
loop for each interval
Planner->>Pred: update with observed metrics
Planner->>Planner: predict_load()
Planner->>Planner: _compute_replica_requirements()
end
Planner->>Plot: render 2x2 plots
Plot-->>Planner: save figure
sequenceDiagram
autonumber
participant LP as load_predictor.py
participant ARIMA as ARIMA(model)
participant Log as Logger
LP->>LP: is_constant(series)?
alt constant
LP-->>LP: return last value
else non-constant
LP->>ARIMA: fit & forecast (try)
alt success
ARIMA-->>LP: next value
else exception
ARIMA-->>LP: raise
LP->>Log: warn and fallback
LP-->>LP: return last value
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Nitpick comments (19)
benchmarks/README.md (1)
31-31: Grammar fix: “information are” → “information is”.Minor copy edit to improve readability.
Apply this diff:
-Detailed information are provided in the `prefix_data_generator` directory. +Detailed information is provided in the `prefix_data_generator` directory.benchmarks/prefix_data_generator/tests/test_hasher.py (1)
48-51: Avoid CI flakiness from external model downloads (optional).The
deepseek_tokenizerfixture downloads a HF model at test time. Consider marking this test as slow and/or skipping when offline to keep CI stable.You could adapt like:
import os import pytest @pytest.fixture(scope="module") def deepseek_tokenizer(): if os.environ.get("NO_NETWORK") == "1": pytest.skip("requires network to download tokenizer") return AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-1.3b-base")components/planner/src/dynamo/planner/utils/load_predictor.py (3)
92-111: Robust ARIMA predictor: constant-series guard and safe fallback are good additions.
- Returning the constant value for flat series avoids pmdarima’s degenerate behavior.
- Wrapping fit/predict with a try/except and falling back to the last value prevents runtime failures from propagating.
Two optional improvements:
- Clamp negative forecasts to zero to avoid passing negative loads downstream.
- Add unit tests covering: (a) constant series, (b) ARIMA exception path, (c) normal varying series.
Example minimal clamp:
- return forecast[0] + return max(0, float(forecast[0]))
26-34: Logger is bound to “cmdstanpy”, so your warnings won’t surface. Use a module logger and isolate cmdstanpy suppression.Currently,
logger = logging.getLogger("cmdstanpy")andpropagate=Falsewill sink your ARIMA warnings. Keep the cmdstanpy logger muted, but emit module logs vialogging.getLogger(__name__).Apply this diff:
-logger = logging.getLogger("cmdstanpy") -logger.addHandler(logging.NullHandler()) -logger.propagate = False -logger.setLevel(logging.CRITICAL) +module_logger = logging.getLogger(__name__) +cmdstanpy_logger = logging.getLogger("cmdstanpy") +cmdstanpy_logger.addHandler(logging.NullHandler()) +cmdstanpy_logger.propagate = False +cmdstanpy_logger.setLevel(logging.CRITICAL)And update the warning site:
- logger.warning(f"ARIMA prediction failed: {e}, using last value") + module_logger.warning(f"ARIMA prediction failed: {e}, using last value")Also applies to: 109-111
31-34: Consider scoping warning suppression.Global
warnings.filterwarnings("ignore", ...)can hide useful signals across the process. Prefer scoping (context manager) around Prophet fit if that’s the primary source of noise.Example:
with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=FutureWarning) warnings.filterwarnings("ignore", category=UserWarning) model.fit(df)benchmarks/sin_load_generator/sin_synth.py (2)
100-106: Typo: “poison distribution” → “Poisson distribution”; keep docs consistent with phase shift.Small doc fix and clarity.
Apply this diff:
- # for the process interval at [t, t + process_interval), the number of requests to generate is sampled - # from a poison distribution with the following parameters: - # request_rate(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * pi / period * t - pi / 2) - # the phase shift is pi / 2 to make the request rate start from the minimum at t = 0 + # For the process interval [t, t + process_interval), the number of requests is sampled + # from a Poisson distribution with: + # request_rate(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * pi / period * t - pi / 2) + # The phase shift of pi/2 makes the request rate start from the minimum at t = 0.
31-36: Optional: guard isl_osl_ratio into [0, 1].If users set min/max outside [0,1], the ratio becomes an invalid probability. A light clamp keeps sampling sane.
For example:
- isl_osl_ratio = (args.isl_osl_ratio_min + args.isl_osl_ratio_max) / 2 + ( - args.isl_osl_ratio_max - args.isl_osl_ratio_min - ) / 2 * np.sin(2 * np.pi / args.isl_osl_ratio_period * t - np.pi / 2) + isl_osl_ratio = (args.isl_osl_ratio_min + args.isl_osl_ratio_max) / 2 + ( + args.isl_osl_ratio_max - args.isl_osl_ratio_min + ) / 2 * np.sin(2 * np.pi / args.isl_osl_ratio_period * t - np.pi / 2) + isl_osl_ratio = float(np.clip(isl_osl_ratio, 0.0, 1.0))Also applies to: 126-131
components/planner/src/dynamo/planner/utils/trace_data_extractor.py (1)
26-33: Stream processing to avoid loading entire file; improve resilience to bad linesFor large trace files, building an in-memory
recordslist is unnecessary and can be memory-heavy. You can stream-accumulate per-interval stats and tolerate occasional malformed lines.Apply this refactor:
- # Read and parse JSONL data from file - records = [] - with open(dataset, "r", encoding="utf-8") as f: - for line in f: - if line.strip(): - records.append(json.loads(line)) - - # Group records by adjustment interval - interval_groups = defaultdict(list) - - for record in records: - ts = record["timestamp"] - # Convert to seconds depending on input units and find the interval - timestamp_sec = (ts / 1000.0) if timestamps_in_ms else float(ts) - interval_start = int(timestamp_sec // adjustment_interval) * adjustment_interval - interval_groups[interval_start].append(record) + # Accumulate per-interval counters on the fly + interval_groups: dict[int, dict[str, float]] = defaultdict(lambda: {"cnt": 0, "isl": 0.0, "osl": 0.0}) + + with open(dataset, "r", encoding="utf-8") as f: + for line in f: + if not line.strip(): + continue + try: + record = json.loads(line) + ts = record["timestamp"] + timestamp_sec = (ts / 1000.0) if timestamps_in_ms else float(ts) + interval_start = int(timestamp_sec // adjustment_interval) * adjustment_interval + + isl = float(record["input_length"]) + osl = float(record["output_length"]) + + agg = interval_groups[interval_start] + agg["cnt"] += 1 + agg["isl"] += isl + agg["osl"] += osl + except (json.JSONDecodeError, KeyError, TypeError, ValueError): + # Skip malformed lines or missing fields + continue @@ - # Compute metrics for each interval - metrics = [] - - for interval_start in sorted(interval_groups.keys()): - records_in_interval = interval_groups[interval_start] - - # Calculate metrics - request_count = len(records_in_interval) - - # Calculate average ISL and OSL - total_isl = sum(record["input_length"] for record in records_in_interval) - total_osl = sum(record["output_length"] for record in records_in_interval) - - avg_isl = total_isl / request_count if request_count > 0 else 0 - avg_osl = total_osl / request_count if request_count > 0 else 0 - - metrics.append( - { - "interval_start": interval_start, - "request_count": request_count, - "avg_isl": avg_isl, - "avg_osl": avg_osl, - } - ) + # Compute metrics for each interval + metrics: List[Dict[str, Any]] = [] + for interval_start in sorted(interval_groups.keys()): + agg = interval_groups[interval_start] + cnt = int(agg["cnt"]) + avg_isl = (agg["isl"] / cnt) if cnt else 0.0 + avg_osl = (agg["osl"] / cnt) if cnt else 0.0 + metrics.append( + { + "interval_start": int(interval_start), + "request_count": cnt, + "avg_isl": avg_isl, + "avg_osl": avg_osl, + } + )Also applies to: 33-45, 46-68
benchmarks/sin_load_generator/README.md (5)
39-41: Add language to fenced code block (markdownlint MD040)Explicitly annotate fenced code blocks to satisfy markdownlint and improve readability.
Apply:
-``` +```text request_rate(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * π / period * t - π / 2)--- `59-63`: **Add language to fenced code block (markdownlint MD040)** Same fix for the ISL/OSL ratio formula block. ```diff -``` +```text isl-osl-ratio(t) = (min + max) / 2 + (max - min) / 2 * sin(2 * π / period * t - π / 2)--- `8-8`: **Consistent capitalization for JSONL** Use uppercase JSONL consistently. ```diff -`sin_synth.py` is a simple script to generate synthetic load with sinusoidal request rate and isl/osl ratio. The output is in [mooncake-style](https://github.com/kvcache-ai/Mooncake) jsonl format, which can be directly used in [GenAI-Perf](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf/genai_perf). +`sin_synth.py` is a simple script to generate synthetic load with sinusoidal request rate and isl/osl ratio. The output is in [mooncake-style](https://github.com/kvcache-ai/Mooncake) JSONL format, which can be directly used in [GenAI-Perf](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf/genai_perf).
56-58: Fix capitalization and phrasing for ISL/OSL ratio explanationMinor clarity and capitalization fixes.
-The script will generate load with requests sampled from two preset ISL/OSL combinations. -The ISL/OSL ratio defines how much of requests follow the first preset ISL/OSL pattern. ISl/OSL 0 means all requests follow the first preset ISL/OSL pattern, while ISL/OSL 1 means all requests follow the second preset ISL/OSL pattern. +The script will generate load with requests sampled from two preset ISL/OSL combinations. +The ISL/OSL ratio defines the fraction of requests that follow the first preset ISL/OSL pattern. ISL/OSL ratio 0 means all requests follow the first preset pattern, while ISL/OSL ratio 1 means all requests follow the second preset pattern.
123-123: Fix spacing/typo in closing example sentenceAdds a missing space for readability.
-This generates a 60-second dataset with request rate fixed at 5 requests/second, with ISL/OSL ratio varying between 0.2 and 0.8 between I3000O150 and I500O2000over a 20-second period. +This generates a 60-second dataset with request rate fixed at 5 requests/second, with ISL/OSL ratio varying between 0.2 and 0.8 between I3000O150 and I500O2000 over a 20-second period.benchmarks/prefix_data_generator/tests/test_synthesizer.py (1)
22-22: Import path update aligns with the namespace refactorThe new import from
prefix_data_generator.synthesizeris consistent with the package rename. No issues.Optional: seed randomness to keep tests deterministic across runs.
import random @@ +random.seed(0)components/planner/test/planner_sla_dryrun.py (1)
21-22: Optional: configure basic logging or remove unused logger.
loggeris declared but never used here. Either configure and use it (e.g.,logging.basicConfig(level=logging.INFO)) or remove it for clarity.benchmarks/pyproject.toml (1)
62-65: Ensure all subpackages are included in the wheel; consider using setuptools package discovery.Specifying
packages = ["prefix_data_generator"]may exclude subpackages unless you list them all. Prefer discovery to includeprefix_data_generator.*. Also, including**/*.pyas package data is unnecessary; Python files are already packaged.Apply this diff:
-[tool.setuptools] -packages = ["prefix_data_generator"] - -[tool.setuptools.package-data] -prefix_data_generator = ["**/*.py"] +[tool.setuptools] +packages = { find = { include = ["prefix_data_generator*"] } } + +# Only include non-Python assets here if needed (examples): +# [tool.setuptools.package-data] +# prefix_data_generator = ["py.typed", "data/**/*"]Additionally, consider renaming the distribution for clarity:
- Current:
[project].name = "data-generator"- Optional:
"prefix-data-generator"to align with the new import namespace.Would you like me to propagate this rename across docs and any packaging references?
components/planner/src/dynamo/planner/utils/planner_core.py (3)
433-446: Avoid shadowing the time module; rename the x-axis list.
time = [0]shadows the importedtimemodule within this scope. Rename to avoid confusion.Apply this diff:
- time = [0] + t_axis = [0] @@ - est_rr = [metrics[0]["request_count"]] + est_rr = [metrics[0]["request_count"]] @@ - # update time - time.append(time[-1] + self.args.adjustment_interval) + # update time + t_axis.append(t_axis[-1] + self.args.adjustment_interval) @@ - ax1.plot(time, rr, "b-", label="Actual Request Rate", linewidth=2) - ax1.plot(time, est_rr, "r--", label="Predicted Request Rate", linewidth=2) + ax1.plot(t_axis, rr, "b-", label="Actual Request Rate", linewidth=2) + ax1.plot(t_axis, est_rr, "r--", label="Predicted Request Rate", linewidth=2) @@ - ax2.plot(time, isl, "g-", label="Actual ISL", linewidth=2) - ax2.plot(time, est_isl, "g--", label="Predicted ISL", linewidth=2) - ax2.plot(time, osl, "m-", label="Actual OSL", linewidth=2) - ax2.plot(time, est_osl, "m--", label="Predicted OSL", linewidth=2) + ax2.plot(t_axis, isl, "g-", label="Actual ISL", linewidth=2) + ax2.plot(t_axis, est_isl, "g--", label="Predicted ISL", linewidth=2) + ax2.plot(t_axis, osl, "m-", label="Actual OSL", linewidth=2) + ax2.plot(t_axis, est_osl, "m--", label="Predicted OSL", linewidth=2) @@ - ax3.plot(time, p_thpt, "b-", label="Actual Prefill Throughput", linewidth=2) + ax3.plot(t_axis, p_thpt, "b-", label="Actual Prefill Throughput", linewidth=2) @@ - time, safe_p_thpt, "b--", label="Safe Prefill Throughput Limit", linewidth=2 + t_axis, safe_p_thpt, "b--", label="Safe Prefill Throughput Limit", linewidth=2 @@ - time, num_p, "c-", label="Prefill Workers", linewidth=2, marker="o" + t_axis, num_p, "c-", label="Prefill Workers", linewidth=2, marker="o" @@ - ax4.plot(time, d_thpt, "r-", label="Actual Decode Throughput", linewidth=2) + ax4.plot(t_axis, d_thpt, "r-", label="Actual Decode Throughput", linewidth=2) @@ - time, safe_d_thpt, "r--", label="Safe Decode Throughput Limit", linewidth=2 + t_axis, safe_d_thpt, "r--", label="Safe Decode Throughput Limit", linewidth=2 @@ - time, num_d, "orange", label="Decode Workers", linewidth=2, marker="o" + t_axis, num_d, "orange", label="Decode Workers", linewidth=2, marker="o"
511-515: Clarify y-axis units for request plot.You plot counts per adjustment interval; reflect that in the label for clarity.
Apply this diff:
- ax1.set_ylabel("Request Rate") + ax1.set_ylabel("Requests per interval")
571-576: Create output directory before saving the figure.Saving will fail if the parent directory doesn’t exist.
Apply this diff:
- plt.tight_layout() - plt.savefig(self.args.output_plot, dpi=300, bbox_inches="tight") + plt.tight_layout() + from pathlib import Path + out_path = Path(self.args.output_plot) + out_path.parent.mkdir(parents=True, exist_ok=True) + plt.savefig(out_path, dpi=300, bbox_inches="tight") plt.close() - logger.info(f"Dryrun plot saved to {self.args.output_plot}") + logger.info(f"Dryrun plot saved to {out_path}")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
tests/planner/figures/dryrun_plot.pngis excluded by!**/*.png
📒 Files selected for processing (20)
benchmarks/README.md(1 hunks)benchmarks/prefix_data_generator/__init__.py(1 hunks)benchmarks/prefix_data_generator/cli.py(1 hunks)benchmarks/prefix_data_generator/example.py(1 hunks)benchmarks/prefix_data_generator/graph_utils.py(1 hunks)benchmarks/prefix_data_generator/prefix_analyzer.py(1 hunks)benchmarks/prefix_data_generator/synthesizer.py(2 hunks)benchmarks/prefix_data_generator/tests/test_hasher.py(1 hunks)benchmarks/prefix_data_generator/tests/test_sampler.py(1 hunks)benchmarks/prefix_data_generator/tests/test_synthesizer.py(1 hunks)benchmarks/pyproject.toml(2 hunks)benchmarks/sin_load_generator/README.md(1 hunks)benchmarks/sin_load_generator/sin_synth.py(4 hunks)components/planner/src/dynamo/planner/planner_sla.py(2 hunks)components/planner/src/dynamo/planner/utils/argparse.py(1 hunks)components/planner/src/dynamo/planner/utils/load_predictor.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(6 hunks)components/planner/src/dynamo/planner/utils/trace_data_extractor.py(1 hunks)components/planner/test/planner_sla_dryrun.py(1 hunks)tests/planner/README.md(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (14)
benchmarks/prefix_data_generator/tests/test_sampler.py (1)
benchmarks/prefix_data_generator/sampler.py (1)
EmpiricalSampler(50-69)
benchmarks/prefix_data_generator/tests/test_synthesizer.py (1)
benchmarks/prefix_data_generator/synthesizer.py (1)
Synthesizer(34-330)
benchmarks/prefix_data_generator/tests/test_hasher.py (1)
benchmarks/prefix_data_generator/hasher.py (2)
hashes_to_texts(96-166)texts_to_hashes(33-93)
components/planner/src/dynamo/planner/utils/argparse.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
SLAPlannerDefaults(72-83)
benchmarks/prefix_data_generator/graph_utils.py (1)
benchmarks/prefix_data_generator/sampler.py (1)
get_cdf(26-28)
components/planner/test/planner_sla_dryrun.py (2)
components/planner/src/dynamo/planner/utils/argparse.py (1)
create_sla_planner_parser(21-122)components/planner/src/dynamo/planner/utils/planner_core.py (2)
Planner(55-575)dryrun_run(407-575)
benchmarks/prefix_data_generator/synthesizer.py (3)
benchmarks/prefix_data_generator/graph_utils.py (5)
_mark_visited(34-43)_merge_chains(46-111)_precompute_transition_cdfs(142-155)_remove_leaves(114-139)_verify_tree(22-31)benchmarks/prefix_data_generator/sampler.py (2)
EmpiricalSampler(50-69)sample_from_cdf(38-47)benchmarks/prefix_data_generator/logging_utils.py (1)
calculate_and_print_statistics(23-55)
benchmarks/prefix_data_generator/prefix_analyzer.py (1)
benchmarks/prefix_data_generator/logging_utils.py (1)
calculate_and_print_statistics(23-55)
benchmarks/prefix_data_generator/example.py (2)
benchmarks/prefix_data_generator/hasher.py (1)
hashes_to_texts(96-166)benchmarks/prefix_data_generator/synthesizer.py (1)
Synthesizer(34-330)
benchmarks/prefix_data_generator/__init__.py (1)
benchmarks/prefix_data_generator/cli.py (1)
main(20-48)
benchmarks/prefix_data_generator/cli.py (3)
benchmarks/prefix_data_generator/__init__.py (1)
main(19-20)benchmarks/prefix_data_generator/prefix_analyzer.py (1)
main(156-183)benchmarks/prefix_data_generator/synthesizer.py (1)
main(333-438)
components/planner/src/dynamo/planner/utils/load_predictor.py (1)
tests/conftest.py (1)
logger(124-135)
components/planner/src/dynamo/planner/planner_sla.py (1)
components/planner/src/dynamo/planner/utils/argparse.py (1)
create_sla_planner_parser(21-122)
components/planner/src/dynamo/planner/utils/planner_core.py (4)
components/planner/src/dynamo/planner/utils/trace_data_extractor.py (1)
extract_metrics_from_mooncake(9-68)components/planner/src/dynamo/planner/defaults.py (1)
SLAPlannerDefaults(72-83)components/planner/src/dynamo/planner/utils/load_predictor.py (4)
predict_next(57-59)predict_next(70-71)predict_next(87-111)predict_next(142-162)components/planner/src/dynamo/planner/utils/perf_interpolation.py (4)
interpolate_thpt_per_gpu(52-54)interpolate_thpt_per_gpu(141-145)find_best_throughput_per_gpu(147-168)interpolate_ttft(48-50)
🪛 LanguageTool
benchmarks/sin_load_generator/README.md
[grammar] ~19-~19: There might be a mistake here.
Context: ...ons - --block-size INT (default: 512) - Block size for hashing, since there is n...
(QB_NEW_EN)
[grammar] ~22-~22: There might be a mistake here.
Context: ... - --total-blocks INT (default: 10000) - ISL prompt blocks are randomly sampled f...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...tput-file STR` (default: auto-generated) - Output file name (in jsonl format) - I...
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...) - Output file name (in jsonl format) - If not specified, the script will genera...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ... - --time-duration INT (default: 100) - Total time duration of the dataset in se...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ... - --process-interval INT (default: 1) - Sampling interval used to generate the d...
(QB_NEW_EN)
[grammar] ~45-~45: There might be a mistake here.
Context: ... --request-rate-min FLOAT (default: 5) - Minimum request rate in requests per sec...
(QB_NEW_EN)
[grammar] ~48-~48: There might be a mistake here.
Context: ...--request-rate-max FLOAT (default: 10) - Maximum request rate in requests per sec...
(QB_NEW_EN)
[grammar] ~51-~51: There might be a mistake here.
Context: ...request-rate-period FLOAT` (default: 10) - Period of the sinusoidal request rate in...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ... t = 0. - --isl1 INT (default: 100) - Minimum input sequence length - `--osl1...
(QB_NEW_EN)
[grammar] ~69-~69: There might be a mistake here.
Context: ...e length - --osl1 INT (default: 2000) - Minimum output sequence length - `--isl...
(QB_NEW_EN)
[grammar] ~72-~72: There might be a mistake here.
Context: ...e length - --isl2 INT (default: 5000) - Maximum input sequence length - `--osl2...
(QB_NEW_EN)
[grammar] ~75-~75: There might be a mistake here.
Context: ...ce length - --osl2 INT (default: 100) - Maximum output sequence length - `--isl...
(QB_NEW_EN)
[grammar] ~78-~78: There might be a mistake here.
Context: ...-isl-osl-ratio-min FLOAT` (default: 0.2) - Minimum ratio of input sequence length t...
(QB_NEW_EN)
[grammar] ~81-~81: There might be a mistake here.
Context: ...-isl-osl-ratio-max FLOAT` (default: 0.8) - Maximum ratio of input sequence length t...
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...sl-osl-ratio-period FLOAT` (default: 10) - Period of the sinusoidal input/output se...
(QB_NEW_EN)
[grammar] ~123-~123: There might be a mistake here.
Context: ...``` This generates a 60-second dataset with request rate fixed at 5 requests/second...
(QB_NEW_EN)
tests/planner/README.md
[grammar] ~69-~69: There might be a mistake here.
Context: ...quest/s/gpu for both prefill and decode. To test planner's performance for differ...
(QB_NEW_EN)
[grammar] ~70-~70: There might be a mistake here.
Context: ...t/s/gpu for both prefill and decode. To test planner's performance for different req...
(QB_NEW_EN)
[grammar] ~70-~70: There might be a mistake here.
Context: ...rate varying between 20 to 80 request/s. For TP1 H200 engine, planner should scal...
(QB_NEW_EN)
[grammar] ~71-~71: There might be a mistake here.
Context: ... 20 to 80 request/s. For TP1 H200 engine, planner should scale between 1P1D and 3...
(QB_NEW_EN)
[grammar] ~86-~86: There might be a mistake here.
Context: ...k to 20 request/s at t=600s, and repeat. The total duration is 30 minutes or 1800...
(QB_NEW_EN)
[grammar] ~91-~91: There might be a mistake here.
Context: ...00 seconds. ## Planner Dry Run Before testing SLA planner on real deployments, we pro...
(QB_NEW_EN)
[grammar] ~93-~93: There might be a mistake here.
Context: ...equests can be processed within the SLA. - The correction factor will be disabled b...
(QB_NEW_EN)
[grammar] ~107-~107: There might be a mistake here.
Context: ...o_output_plot> ``` For example, to dry run SLA planner for the previous FP8 8B on ...
(QB_NEW_EN)
[grammar] ~121-~121: Ensure spelling is correct
Context: ...-plot dryrun_plot.png ``` Below is the dryrun result: 
[grammar] ~129-~129: There might be a mistake here.
Context: ...plot shows the actual prefill throughput, number of prefill workers that planner ...
(QB_NEW_EN)
[grammar] ~129-~129: There might be a mistake here.
Context: ...l throughput, number of prefill workers that planner scales, and the safe throughput...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...plot, shows the actual decode throughput, number of decode workers that planner s...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...de throughput, number of decode workers that planner scales, and the safe throughput...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...ctual deployment can adhere the ITL SLA.
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
benchmarks/sin_load_generator/README.md
39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
60-60: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (23)
benchmarks/README.md (1)
28-31: CLI commanddatagenis correctly configured
Verified that inbenchmarks/pyproject.tomlunder[project.scripts]the entrydatagen = "prefix_data_generator.cli:main"remains intact. No updates to the README are needed.
benchmarks/prefix_data_generator/tests/test_hasher.py (1)
20-20: Import path update looks correct.The switch to
prefix_data_generator.hasheraligns with the module rename.benchmarks/prefix_data_generator/tests/test_sampler.py (1)
19-19: Import path update LGTM.Matches the new namespace; no API changes required.
benchmarks/sin_load_generator/sin_synth.py (2)
31-36: Phase shift for isl/osl ratio looks correct.Using
sin(... - pi/2)starts at the minimum at t=0 as documented.
44-47: Phase shift for request_rate is consistent with docs.Same offset applied; initial request rate starts at the configured minimum.
benchmarks/prefix_data_generator/prefix_analyzer.py (1)
19-19: LGTM: import path updated to new module namespaceThe switch to
prefix_data_generator.logging_utilsmatches the refactor and keeps usage unchanged.benchmarks/prefix_data_generator/__init__.py (1)
16-16: LGTM: public CLI alias now targets the renamed packageThe
cli_mainimport points toprefix_data_generator.cli.main. Matches CLI changes elsewhere.benchmarks/prefix_data_generator/graph_utils.py (1)
18-19: Import Refactor Verified
- No lingering
data_generatorimports found.get_cdfis defined inbenchmarks/prefix_data_generator/sampler.py.CACHE_END,END_NODE, andSUPER_ROOTare defined inbenchmarks/prefix_data_generator/protocols.py.Imports are correctly updated under
prefix_data_generatorwith no functional changes.benchmarks/prefix_data_generator/example.py (1)
20-21: Updated imports to prefix_data_generator — LGTM.The example now targets the renamed package; no behavior change.
benchmarks/prefix_data_generator/synthesizer.py (2)
23-31: Namespace refactor for internal imports — looks good.All internal imports now reference prefix_data_generator.* consistently; matches the package rename.
337-337: Adjusted logging_utils import path — OK.calculate_and_print_statistics is correctly sourced from prefix_data_generator.logging_utils.
components/planner/src/dynamo/planner/planner_sla.py (1)
22-22: Centralized SLA parser import — good consolidation.Shifting to create_sla_planner_parser reduces duplication and keeps CLI options consistent.
benchmarks/prefix_data_generator/cli.py (2)
39-39: Refactored analyze import to new namespace — OK.Dynamic import with argv forwarding remains unchanged; works for the renamed module.
45-45: Refactored synthesize import to new namespace — OK.Consistent with the package rename; no functional differences.
components/planner/src/dynamo/planner/utils/argparse.py (1)
21-122: Centralizing SLA planner argparse is solid.The option set and defaults look consistent with SLAPlannerDefaults. Clear separation from CLI scripts is good for reuse.
benchmarks/pyproject.toml (1)
52-52: Script entrypoint update looks correct.
datagennow points toprefix_data_generator.cli:main, matching the namespace rename.tests/planner/README.md (2)
47-63: Interpolator example updates look coherent.The OSL and computed values are consistent with the narrative and updated parameters.
98-105: CLI shows --dry-run flag, but the script didn’t accept it.Documentation is correct conceptually; the script needs to accept
--dry-runto avoid argparse errors. See my suggested code change in components/planner/test/planner_sla_dryrun.py to add the flag and wire it through. Once merged, this section remains valid.Do you want me to open a doc-task issue to track this coupling between CLI and docs?
components/planner/src/dynamo/planner/utils/planner_core.py (5)
22-22: Import for trace extractor is correct.
56-78: Dry-run-aware initialization looks good.Skipping runtime/connector/Prometheus initialization in dry-run is appropriate; production path remains intact.
101-121: Prometheus metrics initialization only in non-dry-run is appropriate.Gauges and server lifecycle are guarded by prom port; error handling is in place.
122-128: Correction-factor defaults in dry-run are sensible.Disabling correction in dry-run avoids using unavailable SLA metrics.
256-270: Double-check the prefill correction-factor application.You multiply the predicted prefill load per GPU by
min(1, p_correction_factor). Ifp_correction_factor < 1represents additional queuing slowing TTFT, this reduces the estimated load and thus reduces required replicas, which seems opposite to compensating for queue-induced TTFT inflation. If the intent is to scale replicas up whenp_correction_factor < 1, consider dividing effective throughput by the factor or multiplying the load by1 / p_correction_factor(with caps).Would you like me to propose a concrete adjustment once the intended semantics of
p_correction_factorare confirmed?
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
…dynamo into hzhou/planner-datagen
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary by CodeRabbit