Skip to content

Commit 1e55e88

Browse files
committed
fix: correct planner test example after tokenizer fix
1 parent 05913af commit 1e55e88

File tree

7 files changed

+21
-20
lines changed

7 files changed

+21
-20
lines changed

components/planner/src/dynamo/planner/planner_sla.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from pydantic import BaseModel
2020

2121
from dynamo.planner.defaults import SLAPlannerDefaults
22-
from dynamo.planner.utils.argparse import create_sla_planner_parser
22+
from dynamo.planner.utils.planner_argparse import create_sla_planner_parser
2323
from dynamo.planner.utils.planner_core import start_sla_planner
2424
from dynamo.runtime import DistributedRuntime, dynamo_worker
2525

components/planner/test/planner_sla_dryrun.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
import logging
1717

18-
from dynamo.planner.utils.argparse import create_sla_planner_parser
18+
from dynamo.planner.utils.planner_argparse import create_sla_planner_parser
1919
from dynamo.planner.utils.planner_core import Planner
2020

2121
logger = logging.getLogger(__name__)

tests/planner/README.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -48,42 +48,43 @@ python components/planner/src/dynamo/planner/utils/perf_interpolation.py \
4848
--ttft 0.1 \
4949
--itl 0.01
5050

51-
> ISL=3000, OSL=300
52-
> TTFT=0.1s, ITL=0.01s
53-
> Using profile results from tests/planner/profiling_results/H200_TP1P_TP1D/
54-
>
55-
> Interpolating prefill performance ...
56-
> Estimated TTFT=0.027s <= target TTFT=0.100s. Requests can queue 0.073s maximally while meeting TTFT SLA.
57-
> Estimated throughput: 110893.48 tokens/s/gpu. Request rate at 36.96 requests/s will saturate one GPU.
51+
# output:
52+
ISL=3000, OSL=300
53+
TTFT=0.1s, ITL=0.01s
54+
Using profile results from tests/planner/profiling_results/H200_TP1P_TP1D/
55+
56+
Interpolating prefill performance ...
57+
Estimated TTFT=0.060s <= target TTFT=0.100s. Requests can queue 0.040s maximally while meeting TTFT SLA.
58+
Estimated throughput: 49481.09 tokens/s/gpu. Request rate at 16.49 requests/s will saturate one GPU.
5859

5960
Interpolating decode performance ...
60-
> Average context length: isl + osl/2 = 3150.
61-
> Estimated ITL=0.0098s <= target ITL=0.0100s at 36.36% active kv usage.
62-
> Estimated throughput: 10009.88 token/s/gpu. Request rate at 33.37 requests/s will saturate one GPU.
61+
Average context length: isl + osl/2 = 3150.
62+
Estimated ITL=0.0097s <= target ITL=0.0100s at 16.16% active kv usage.
63+
Estimated throughput: 4555.68 token/s/gpu. Request rate at 15.19 requests/s will saturate one GPU.
6364
```
6465
6566
## Generating Load Dataset
6667
6768
We provide a tool to generate load dataset with varying request rate. More details can be found in [sin_load_generator](../../benchmarks/sin_load_generator/README.md).
6869
69-
From previous interpolator testing, ISL 3000 and OSL 300 can handle ~30 request/s/gpu for both prefill and decode.
70-
To test planner's performance for different request rates, we can generate a load dataset with request rate varying between 20 to 80 request/s.
70+
From previous interpolator testing, ISL 3000 and OSL 300 can handle ~15 request/s/gpu for both prefill and decode.
71+
To test planner's performance for different request rates, we can generate a load dataset with request rate varying between 12 to 36 request/s.
7172
For TP1 H200 engine, planner should scale between 1P1D and 3P3D.
7273
7374
```bash
7475
python benchmarks/sin_load_generator/sin_synth.py \
7576
--time-duration 1800 \
76-
--request-rate-min 20 \
77-
--request-rate-max 80 \
77+
--request-rate-min 12 \
78+
--request-rate-max 36 \
7879
--request-rate-period 600 \
7980
--isl1 3000 \
8081
--osl1 300 \
8182
--isl2 3000 \
8283
--osl2 300 \
83-
--output-file rr-20-80_i3000o300.jsonl
84+
--output-file rr-12-36_i3000o300.jsonl
8485
```
8586
86-
The dataset starts at 20 requests/s, increases to 80 requests/s at t=300s, decreases back to 20 requests/s at t=600s, and repeats.
87+
The dataset starts at 12 requests/s, increases to 36 requests/s at t=300s, decreases back to 12 requests/s at t=600s, and repeats.
8788
The total duration is 30 minutes or 1800 seconds.
8889
## Planner Dry Run
8990
@@ -103,15 +104,15 @@ python components/planner/test/planner_sla_dryrun.py \
103104
--output-plot <path_to_output_plot>
104105
```
105106
106-
For example, to dry run SLA planner for the previous FP8 8B on H200 using the generated `rr-20-80_i3000o300.jsonl` dataset,
107+
For example, to dry run SLA planner for the previous FP8 8B on H200 using the generated `rr-12-36_i3000o300.jsonl` dataset,
107108
108109
```bash
109110
python components/planner/test/planner_sla_dryrun.py \
110111
--ttft 0.1 \
111112
--itl 0.01 \
112113
--adjustment-interval 60 \
113114
--profile-results-dir tests/planner/profiling_results/H200_TP1P_TP1D/ \
114-
--dataset rr-20-80_i3000o300.jsonl \
115+
--dataset rr-12-36_i3000o300.jsonl \
115116
--start-num-p 1 \
116117
--start-num-d 1 \
117118
--output-plot dryrun_plot.png
-36.7 KB
Loading
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)