You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> The SLA planner requires a frontend that reports metrics at `/metrics` HTTP endpoint with number of requests, ISL, OSL, TTFT, ITL in the correct format. The dynamo frontend provides these metrics automatically.
118
+
> The SLA planner requires a frontend that reports metrics at `/metrics` HTTP endpoint with number of requests, ISL, OSL, TTFT, ITL in the correct format. The dynamo frontend provides these metrics automatically.
The dataset starts at 12 requests/s, increases to 36 requests/s at t=300s, decreases back to 12 requests/s at t=600s, and repeats.
88
88
The total duration is 30 minutes or 1800 seconds.
89
+
89
90
## Planner Dry Run
90
91
91
92
Before testing SLA planner on real deployments, we provide a dry run feature to test the autoscaling behavior on a given dataset. Specifically, in dry run mode,
@@ -129,3 +130,64 @@ The second plot shows the actual ISL/OSL and the predicted ISL/OSL. The first tw
129
130
The third plot shows the actual prefill throughput, number of prefill workers that planner scales, and the safe throughput limit with the number of prefill workers. If the actual throughput is below the safe throughput limit, the deployment has the capacity to adhere the TTFT SLA. Note that in the real deployment, due to other factors such as queueing, load balancing, KV cache transfer latency, and ISL variance, it is not guaranteed that the actual deployment can adhere the TTFT SLA.
130
131
131
132
The fourth plot, similar to the third plot, shows the actual decode throughput, number of decode workers that planner scales, and the safe throughput limit with the number of decode workers. If the actual throughput is below the safe throughput limit, the deployment has the capacity to adhere the ITL SLA. Note that in the real deployment, due to other factors such as load balancing and OSL variance, it is not guaranteed that the actual deployment can adhere the ITL SLA.
133
+
134
+
## Scaling Tests
135
+
136
+
This directory contains comprehensive tests for validating the SLA planner's scaling behavior. The tests validate both the replica calculation logic and end-to-end scaling behavior. The scaling test uses a graduated load approach rather than dataset files, as it proved more reliable for metric generation and scaling triggers.
137
+
138
+
### Test Types
139
+
140
+
1. **Unit Tests** (`test_replica_calculation.py`) - Test the mathematical formulas forcalculating prefill and decode replicasin isolation
141
+
2. **End-to-End Tests** (`run_scaling_test.sh`) - Test complete workflow including Kubernetes deployment, load generation, and pod scaling validation
142
+
143
+
### Quick Start
144
+
145
+
#### Run Unit Tests Only
146
+
Test the replica calculation logic without requiring Kubernetes:
147
+
148
+
```bash
149
+
python -m pytest test_replica_calculation.py -v
150
+
```
151
+
152
+
#### Run Full End-to-End Test
153
+
Test complete scaling behavior including Kubernetes deployment and load generation:
154
+
155
+
```bash
156
+
./run_scaling_test.sh
157
+
```
158
+
159
+
With custom namespace:
160
+
```bash
161
+
./run_scaling_test.sh --namespace production
162
+
```
163
+
164
+
To save results to `tests/planner/e2e_scaling_results` instead of `/tmp`:
165
+
```bash
166
+
./run_scaling_test.sh --save-results
167
+
```
168
+
169
+
**E2E Test Deployment Management:**
170
+
- If no deployment exists: creates, tests, and cleans up deployment
171
+
- If deployment exists: uses existing deployment and preserves it
172
+
- Perfect for development workflows where you want to keep deployments running between tests
173
+
174
+
**Test Scenario**
175
+
176
+
The main test scenario validates prefill scaling for H200 with 1P1D → 2P1D configuration:
Prometheus: # NOTE: this is set on Prometheus to ensure a service is created for the Prometheus component. This is a workaround and should be managed differently.
0 commit comments