Skip to content

Commit f1151c0

Browse files
windsonseaMu Huai
authored andcommitted
[Doc] Fix a 404 link in installation/cpu.md (vllm-project#16773)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
1 parent 7afd75d commit f1151c0

File tree

1 file changed

+1
-1
lines changed
  • docs/source/getting_started/installation

1 file changed

+1
-1
lines changed

docs/source/getting_started/installation/cpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,7 @@ $ python examples/offline_inference/basic/basic.py
272272

273273
- Decouple the HTTP serving components from the inference components. In a GPU backend configuration, the HTTP serving and tokenization tasks operate on the CPU, while inference runs on the GPU, which typically does not pose a problem. However, in a CPU-based setup, the HTTP serving and tokenization can cause significant context switching and reduced cache efficiency. Therefore, it is strongly recommended to segregate these two components for improved performance.
274274

275-
- On CPU based setup with NUMA enabled, the memory access performance may be largely impacted by the [topology](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/performance_tuning/tuning_guide.inc.md#non-uniform-memory-access-numa). For NUMA architecture, Tensor Parallel is a option for better performance.
275+
- On CPU based setup with NUMA enabled, the memory access performance may be largely impacted by the [topology](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/performance_tuning/tuning_guide.md#non-uniform-memory-access-numa). For NUMA architecture, Tensor Parallel is a option for better performance.
276276

277277
- Tensor Parallel is supported for serving and offline inferencing. In general each NUMA node is treated as one GPU card. Below is the example script to enable Tensor Parallel = 2 for serving:
278278

0 commit comments

Comments
 (0)