Skip to content

Commit 6243bcb

Browse files
tedzhouhkhhzhang16
andauthored
feat: support MoE model in SLA Planner Sglang (#3185)
Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
1 parent 8f338a6 commit 6243bcb

File tree

19 files changed

+685
-267
lines changed

19 files changed

+685
-267
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
apiVersion: batch/v1
4+
kind: Job
5+
metadata:
6+
name: profile-sla
7+
namespace: ${NAMESPACE}
8+
spec:
9+
template:
10+
spec:
11+
serviceAccountName: dynamo-sa
12+
containers:
13+
- name: profile-sla
14+
image: ${DOCKER_IMAGE}
15+
resources:
16+
requests:
17+
cpu: "32"
18+
memory: "50Gi"
19+
env:
20+
- name: HUGGING_FACE_HUB_TOKEN
21+
valueFrom:
22+
secretKeyRef:
23+
name: hf-token-secret
24+
key: HF_TOKEN
25+
- name: NATS_SERVER
26+
value: nats://${NAMESPACE}-nats:4222
27+
- name: ETCD_ENDPOINTS
28+
value: ${NAMESPACE}-etcd:2379
29+
workingDir: /sgl-workspace/dynamo
30+
command: ["python", "-m", "benchmarks.profiler.profile_sla"]
31+
args:
32+
- --config
33+
- /sgl-workspace/dynamo/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
34+
- --output-dir
35+
- /data/profiling_results
36+
- --namespace
37+
- ${NAMESPACE}
38+
- --backend
39+
- sglang
40+
- --is-moe-model
41+
- --min-num-gpus-per-engine
42+
- "8"
43+
- --max-num-gpus-per-engine
44+
- "16"
45+
- --isl
46+
- "3000"
47+
- --osl
48+
- "150"
49+
- --ttft
50+
- "200"
51+
- --itl
52+
- "20"
53+
volumeMounts:
54+
- name: output-volume
55+
mountPath: /data
56+
restartPolicy: Never
57+
volumes:
58+
- name: output-volume
59+
persistentVolumeClaim:
60+
claimName: dynamo-pvc
61+
backoffLimit: 0

benchmarks/profiler/profile_endpoint.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
parser = argparse.ArgumentParser(
2323
description="profile a given endpoint's performance for prefill or decode"
2424
)
25+
# TODO: use kebab case
2526
parser.add_argument(
2627
"--mode",
2728
type=str,
@@ -79,6 +80,12 @@
7980
default=8,
8081
help="interpolation granularity for the results",
8182
)
83+
parser.add_argument(
84+
"--attention_dp_size",
85+
type=int,
86+
default=1,
87+
help="attention dp size of the endpoint for MoE models",
88+
)
8289
args = parser.parse_args()
8390

8491
os.makedirs(args.work_dir, exist_ok=True)
@@ -105,6 +112,7 @@
105112
args.max_kv_tokens,
106113
args.max_context_length,
107114
args.interpolation_granularity,
115+
args.attention_dp_size,
108116
)
109117
else:
110118
raise ValueError(f"Invalid mode: {args.mode}")

0 commit comments

Comments
 (0)