Skip to content

Commit

Permalink
vllm speed tweaks (#43)
Browse files Browse the repository at this point in the history
  • Loading branch information
anton-l authored Jan 26, 2025
1 parent d98862a commit 15df4fb
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 1 deletion.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,10 @@ Take a look at the sample dataset at [HuggingFaceH4/numina-deepseek-r1-qwen-7b](

To run the bigger DeepSeek-R1, we used 2 nodes of 8xH100 each one, using the slurm file present in this repo at `slurm/generate.slurm`. First, install the dependencies:

(for now we need to install the vllm dev wheel that [fixes the R1 cuda graph capture](https://github.com/vllm-project/vllm/commits/221d388cc5a836fa189305785ed7e887cea8b510/csrc/moe/moe_align_sum_kernels.cu))
```shell
pip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu121

pip install "distilabel[vllm,ray,openai]>=1.5.2"
```

Expand Down
1 change: 1 addition & 0 deletions slurm/generate.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ RAY_ADDRESS="http://$head_node_ip:8265" ray job submit \
-- vllm serve $MODEL \
--tensor-parallel-size 8 \
--pipeline-parallel-size 4 \
--gpu-memory-utilization=0.85 \
--max-model-len 16384 \
--enable-chunked-prefill \
--trust-remote-code \
Expand Down
2 changes: 1 addition & 1 deletion src/open_r1/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def build_distilabel_pipeline(
generation_kwargs=generation_kwargs,
),
input_mappings={"instruction": prompt_column} if prompt_column is not None else {},
input_batch_size=10,
input_batch_size=64, # on 4 nodes bs ~60+ leads to preemption due to KV cache exhaustion
num_generations=num_generations,
)

Expand Down

0 comments on commit 15df4fb

Please sign in to comment.