Skip to content

Commit c430018

Browse files
windsonseayangw-dev
authored andcommitted
[Doc] Fix a 404 link in installation/cpu.md (vllm-project#16773)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Signed-off-by: Yang Wang <elainewy@meta.com>
1 parent 8cfafcd commit c430018

File tree

1 file changed

+1
-1
lines changed
  • docs/source/getting_started/installation

1 file changed

+1
-1
lines changed

docs/source/getting_started/installation/cpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,7 @@ $ python examples/offline_inference/basic/basic.py
272272

273273
- Decouple the HTTP serving components from the inference components. In a GPU backend configuration, the HTTP serving and tokenization tasks operate on the CPU, while inference runs on the GPU, which typically does not pose a problem. However, in a CPU-based setup, the HTTP serving and tokenization can cause significant context switching and reduced cache efficiency. Therefore, it is strongly recommended to segregate these two components for improved performance.
274274

275-
- On CPU based setup with NUMA enabled, the memory access performance may be largely impacted by the [topology](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/performance_tuning/tuning_guide.inc.md#non-uniform-memory-access-numa). For NUMA architecture, Tensor Parallel is a option for better performance.
275+
- On CPU based setup with NUMA enabled, the memory access performance may be largely impacted by the [topology](https://github.com/intel/intel-extension-for-pytorch/blob/main/docs/tutorials/performance_tuning/tuning_guide.md#non-uniform-memory-access-numa). For NUMA architecture, Tensor Parallel is a option for better performance.
276276

277277
- Tensor Parallel is supported for serving and offline inferencing. In general each NUMA node is treated as one GPU card. Below is the example script to enable Tensor Parallel = 2 for serving:
278278

0 commit comments

Comments
 (0)