@@ -11,10 +11,10 @@ Basic deployment pattern with frontend and a single decode worker.
1111- ` Frontend ` : OpenAI-compatible API server
1212- ` SGLangDecodeWorker ` : Single worker handling both prefill and decode
1313
14- ### 2. ** Aggregated Router Deployment** (` agg_router.yaml ` )
14+ ### 2. ** Aggregated Router Deployment** (` agg_router.yaml ` )
1515Enhanced aggregated deployment with KV cache routing capabilities.
1616
17- ** Architecture:**
17+ ** Architecture:**
1818- ` Frontend ` : OpenAI-compatible API server with router mode enabled (` --router-mode kv ` )
1919- ` SGLangDecodeWorker ` : Single worker handling both prefill and decode
2020
@@ -52,7 +52,7 @@ resources:
5252 memory : " 20Gi"
5353 gpu : " 1"
5454 limits :
55- cpu : " 10"
55+ cpu : " 10"
5656 memory : " 20Gi"
5757 gpu : " 1"
5858` ` `
@@ -96,7 +96,7 @@ Before using these templates, ensure you have:
9696### 1. Choose Your Template
9797Select the deployment pattern that matches your requirements:
9898- Use ` agg.yaml ` for development/testing
99- - Use ` agg_router.yaml ` for production with load balancing
99+ - Use ` agg_router.yaml ` for production with load balancing
100100- Use ` disagg.yaml ` for maximum performance
101101
102102### 2. Customize Configuration
@@ -110,7 +110,7 @@ image: your-registry/sglang-runtime:your-tag
110110args :
111111 - " --model-path"
112112 - " your-org/your-model"
113- - " --served-model-name"
113+ - " --served-model-name"
114114 - " your-org/your-model"
115115` ` `
116116
@@ -134,7 +134,7 @@ kubectl apply -f <your-template>.yaml
134134All templates use ** DeepSeek-R1-Distill-Llama-8B** as the default model. Key parameters:
135135
136136- ` --page-size 16 ` : KV cache page size
137- - ` --tp 1 ` : Tensor parallelism degree
137+ - ` --tp 1 ` : Tensor parallelism degree
138138- ` --trust-remote-code ` : Enable custom model code
139139- ` --skip-tokenizer-init ` : Optimize startup time
140140
0 commit comments