Skip to content

Commit c9e5c45

Browse files
committed
docs(deploy/README.md): fix formatting inconsistencies and enhance deployment instructions
1 parent 883e80f commit c9e5c45

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

components/backends/sglang/deploy/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ Basic deployment pattern with frontend and a single decode worker.
1111
- `Frontend`: OpenAI-compatible API server
1212
- `SGLangDecodeWorker`: Single worker handling both prefill and decode
1313

14-
### 2. **Aggregated Router Deployment** (`agg_router.yaml`)
14+
### 2. **Aggregated Router Deployment** (`agg_router.yaml`)
1515
Enhanced aggregated deployment with KV cache routing capabilities.
1616

17-
**Architecture:**
17+
**Architecture:**
1818
- `Frontend`: OpenAI-compatible API server with router mode enabled (`--router-mode kv`)
1919
- `SGLangDecodeWorker`: Single worker handling both prefill and decode
2020

@@ -52,7 +52,7 @@ resources:
5252
memory: "20Gi"
5353
gpu: "1"
5454
limits:
55-
cpu: "10"
55+
cpu: "10"
5656
memory: "20Gi"
5757
gpu: "1"
5858
```
@@ -96,7 +96,7 @@ Before using these templates, ensure you have:
9696
### 1. Choose Your Template
9797
Select the deployment pattern that matches your requirements:
9898
- Use `agg.yaml` for development/testing
99-
- Use `agg_router.yaml` for production with load balancing
99+
- Use `agg_router.yaml` for production with load balancing
100100
- Use `disagg.yaml` for maximum performance
101101

102102
### 2. Customize Configuration
@@ -110,7 +110,7 @@ image: your-registry/sglang-runtime:your-tag
110110
args:
111111
- "--model-path"
112112
- "your-org/your-model"
113-
- "--served-model-name"
113+
- "--served-model-name"
114114
- "your-org/your-model"
115115
```
116116
@@ -134,7 +134,7 @@ kubectl apply -f <your-template>.yaml
134134
All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. Key parameters:
135135

136136
- `--page-size 16`: KV cache page size
137-
- `--tp 1`: Tensor parallelism degree
137+
- `--tp 1`: Tensor parallelism degree
138138
- `--trust-remote-code`: Enable custom model code
139139
- `--skip-tokenizer-init`: Optimize startup time
140140

0 commit comments

Comments
 (0)