Skip to content

Commit 23e2d8f

Browse files
ishandhananipvijayakrish
authored andcommitted
chore: bump sglang version (#1219)
1 parent 76ad0c7 commit 23e2d8f

File tree

4 files changed

+67
-7
lines changed

4 files changed

+67
-7
lines changed

container/Dockerfile.sglang

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -136,16 +136,14 @@ RUN if [ "$ARCH" = "arm64" ]; then \
136136
fi
137137

138138
# Install sglang
139-
# TODO: NIXL transfer is currently broken as of https://github.com/sgl-project/sglang/commit/7513558074adc4c4015b68e2ae7cf719d3401d5d
140-
# Once this is fixed we will have to install from that commit until a new post is released
141-
ARG SGLANG_COMMIT="4d643f6c7a291c86de64a9e52eca526b2d99775d"
139+
# Once either 0.4.6post6 or 0.4.7 is released, we can switch back to using the published version
140+
# This commit references a fix for DP attention and NIXL https://github.com/sgl-project/sglang/pull/6473
141+
ARG SGLANG_COMMIT="e806f708c954020bda7d1cc98035a44fd6a4eb96"
142142
RUN --mount=type=cache,target=/root/.cache/uv \
143143
git clone https://github.com/sgl-project/sglang.git && \
144144
cd sglang && \
145145
git checkout ${SGLANG_COMMIT} && \
146-
uv pip install -e "python[all]" && \
147-
cd .. && \
148-
rm -rf sglang
146+
uv pip install -e "python[all]"
149147

150148
# Common dependencies
151149
RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \

examples/sglang/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,13 @@ Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead
7777
cd /workspace/examples/sglang
7878
dynamo serve graphs.disagg:Frontend -f ./configs/disagg.yaml
7979
```
80+
81+
##### Disaggregated with MoE and DP attention
82+
83+
SGLang also supports DP attention for MoE models. We provide an example config for this in `configs/disagg-dp-attention.yaml` which is based on the [DeepSeek-R1-Small-2layers](https://huggingface.co/silence09/DeepSeek-R1-Small-2layers) model. You can use this configuration to test out disaggregated serving on a single node before scaling to the full DeepSeek-R1 model across multiple nodes.
84+
85+
```bash
86+
# note this will require 4 GPUs
87+
cd /workspace/examples/sglang
88+
dynamo serve graphs.disagg:Frontend -f ./configs/disagg-dp-attention.yaml
89+
```
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
Frontend:
17+
served_model_name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
18+
endpoint: dynamo.SGLangWorker.generate
19+
port: 8000
20+
21+
SGLangWorker:
22+
model-path: silence09/DeepSeek-R1-Small-2layers
23+
served-model-name: silence09/DeepSeek-R1-Small-2layers
24+
tp: 2
25+
dp-size: 2
26+
enable-dp-attention: true
27+
trust-remote-code: true
28+
skip-tokenizer-init: true
29+
disaggregation-mode: prefill
30+
disaggregation-transfer-backend: nixl
31+
port: 30000
32+
ServiceArgs:
33+
workers: 1
34+
resources:
35+
gpu: 2
36+
37+
SGLangDecodeWorker:
38+
model-path: silence09/DeepSeek-R1-Small-2layers
39+
served-model-name: silence09/DeepSeek-R1-Small-2layers
40+
tp: 2
41+
dp-size: 2
42+
enable-dp-attention: true
43+
trust-remote-code: true
44+
skip-tokenizer-init: true
45+
disaggregation-mode: decode
46+
disaggregation-transfer-backend: nixl
47+
# SGLang requires a port delta between prefill and decode workers when using enable-dp-attention
48+
port: 31000
49+
ServiceArgs:
50+
workers: 1
51+
resources:
52+
gpu: 2

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ vllm = [
6767
]
6868

6969
sglang = [
70-
"sglang[all]@git+https://github.com/sgl-project/sglang@4d643f6c7a291c86de64a9e52eca526b2d99775d#subdirectory=python"
70+
"sglang[all]@git+https://github.com/sgl-project/sglang@e806f708c954020bda7d1cc98035a44fd6a4eb96#subdirectory=python"
7171
]
7272

7373
[project.scripts]

0 commit comments

Comments
 (0)