Skip to content

Commit 8bfc61a

Browse files
authored
chore(sglang): readme and instruction fixes (#1761)
1 parent 6901c7c commit 8bfc61a

File tree

7 files changed

+14
-26
lines changed

7 files changed

+14
-26
lines changed

examples/sglang/README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ that get spawned depend upon the chosen graph.
9595
#### Aggregated
9696

9797
```bash
98-
cd /workspace/examples/sglang
98+
cd $DYNAMO_ROOT/examples/sglang
9999
./launch/agg.sh
100100
```
101101

@@ -108,8 +108,7 @@ cd /workspace/examples/sglang
108108
> After these are in, the TODOs in `worker.py` will be resolved and the placeholder logic removed.
109109
110110
```bash
111-
cd /workspace/examples/sglang
112-
export PYTHONPATH=$PYTHONPATH:/workspace/examples/sglang/utils
111+
cd $DYNAMO_ROOT/examples/sglang
113112
./launch/agg_router.sh
114113
```
115114

@@ -133,7 +132,7 @@ Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead
133132
> Disaggregated serving in SGLang currently requires each worker to have the same tensor parallel size [unless you are using an MLA based model](https://github.com/sgl-project/sglang/pull/5922)
134133
135134
```bash
136-
cd /workspace/examples/sglang
135+
cd $DYNAMO_ROOT/examples/sglang
137136
./launch/disagg.sh
138137
```
139138

@@ -143,7 +142,7 @@ SGLang also supports DP attention for MoE models. We provide an example config f
143142

144143
```bash
145144
# note this will require 4 GPUs
146-
cd /workspace/examples/sglang
145+
cd $DYNAMO_ROOT/examples/sglang
147146
./launch/disagg_dp_attn.sh
148147
```
149148

examples/sglang/dsr1-wideep.md

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ In each container, you should be in the `/sgl-workspace/dynamo/examples/sglang`
7373
# run ingress
7474
dynamo run in=http out=dyn &
7575
# run prefill worker
76-
python3 components/worker_inc.py \
76+
python3 components/worker.py \
7777
--model-path /model/ \
7878
--served-model-name deepseek-ai/DeepSeek-R1 \
7979
--skip-tokenizer-init \
@@ -108,7 +108,7 @@ On the other prefill node (since this example has 4 total prefill nodes), run th
108108
7. Run the decode worker on the head decode node
109109

110110
```bash
111-
python3 components/decode_worker_inc.py \
111+
python3 components/decode_worker.py \
112112
--model-path /model/ \
113113
--served-model-name deepseek-ai/DeepSeek-R1 \
114114
--skip-tokenizer-init \
@@ -138,14 +138,6 @@ python3 components/decode_worker_inc.py \
138138

139139
On the other decode nodes (this example has 9 total decode nodes), run the same command but change `--node-rank` to 1, 2, 3, 4, 5, 6, 7, and 8
140140

141-
8. Run the warmup script to warm up the model
142-
143-
DeepGEMM kernels can sometimes take a while to warm up. Here we provide a small helper script that should help. You can run this as many times as you want before starting inference/benchmarking. You can exec into the head node and run this script standalone - it does not need a container.
144-
145-
```bash
146-
./warmup.sh HEAD_PREFILL_NODE_IP
147-
```
148-
149141
## Benchmarking
150142

151143
In the official [blog post repro instructions](https://github.com/sgl-project/sglang/issues/6017), SGL uses batch inference to benchmark their prefill and decode workers. They do this by pretokenizing the ShareGPT dataset and then creating a batch of 8192 requests with ISL 4096 and OSL 5 (for prefill stress test) and a batch of 40000 with ISL 2000 and OSL 100 (for decode stress test). If you want to repro these benchmarks, you will need to add the following flags to the prefill and decode commands:

examples/sglang/launch/agg.sh

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,9 @@ trap cleanup EXIT INT TERM
1515
python3 utils/clear_namespace.py --namespace dynamo
1616

1717
# run ingress
18-
dynamo run in=http out=dyn &
18+
dynamo run in=http out=dyn --http-port=8000 &
1919
DYNAMO_PID=$!
2020

21-
# run ingress
22-
dynamo run in=http out=dyn &
23-
2421
# run worker
2522
python3 components/worker.py \
2623
--model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B \

examples/sglang/launch/agg_router.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ trap cleanup EXIT INT TERM
1515
python3 utils/clear_namespace.py --namespace dynamo
1616

1717
# run ingress
18-
dynamo run in=http out=dyn --router-mode kv &
18+
dynamo run in=http out=dyn --router-mode kv --http-port=8000 &
1919
DYNAMO_PID=$!
2020

2121
# run worker

examples/sglang/launch/disagg.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ trap cleanup EXIT INT TERM
1515
python3 utils/clear_namespace.py --namespace dynamo
1616

1717
# run ingress
18-
dynamo run in=http out=dyn &
18+
dynamo run in=http out=dyn --http-port=8000 &
1919
DYNAMO_PID=$!
2020

2121
# run prefill worker

examples/sglang/launch/disagg_dp_attn.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ trap cleanup EXIT INT TERM
1515
python3 utils/clear_namespace.py --namespace dynamo
1616

1717
# run ingress
18-
dynamo run in=http out=dyn &
18+
dynamo run in=http out=dyn --http-port=8000 &
1919
DYNAMO_PID=$!
2020

2121
# run prefill worker

examples/sglang/multinode-examples.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Node 1: Run HTTP ingress, processor, and 8 shards of the prefill worker
1717
# run ingress
1818
dynamo run in=http out=dyn &
1919
# run prefill worker
20-
python3 components/worker_inc.py \
20+
python3 components/worker.py \
2121
--model-path /model/ \
2222
--served-model-name deepseek-ai/DeepSeek-R1 \
2323
--tp 16 \
@@ -40,7 +40,7 @@ export NATS_SERVER="nats://<node-1-ip>"
4040
export ETCD_ENDPOINTS="<node-1-ip>:2379"
4141

4242
# worker
43-
python3 components/worker_inc.py \
43+
python3 components/worker.py \
4444
--model-path /model/ \
4545
--served-model-name deepseek-ai/DeepSeek-R1 \
4646
--tp 16 \
@@ -63,7 +63,7 @@ export NATS_SERVER="nats://<node-1-ip>"
6363
export ETCD_ENDPOINTS="<node-1-ip>:2379"
6464

6565
# worker
66-
python3 components/decode_worker_inc.py \
66+
python3 components/decode_worker.py \
6767
--model-path /model/ \
6868
--served-model-name deepseek-ai/DeepSeek-R1 \
6969
--tp 16 \
@@ -86,7 +86,7 @@ export NATS_SERVER="nats://<node-1-ip>"
8686
export ETCD_ENDPOINTS="<node-1-ip>:2379"
8787

8888
# worker
89-
python3 components/decode_worker_inc.py \
89+
python3 components/decode_worker.py \
9090
--model-path /model/ \
9191
--served-model-name deepseek-ai/DeepSeek-R1 \
9292
--tp 16 \

0 commit comments

Comments
 (0)