Skip to content

Conversation

@vijethk-intel
Copy link

@vijethk-intel vijethk-intel commented Mar 27, 2025

vLLM Llama3.1-70B FP8 2K/2K throughput measurements shows good improvement with blocksize 256 , hence adding this as an option to the argument list

@michalkuligowski
Copy link

/run-gaudi-tests

@michalkuligowski michalkuligowski merged commit 87fdcc7 into habana_main Mar 28, 2025
41 checks passed
@michalkuligowski michalkuligowski deleted the vijethk/block_size_256 branch March 28, 2025 08:21
madamczyk-intel pushed a commit that referenced this pull request May 21, 2025
Bringing back option of 256 as possible block-size arg value, that has
been lost in some of the last rebases.

It has been first added via
#971

The options of arguments are now defined by unpacking predefined type
hints

![image](https://github.com/user-attachments/assets/f1c0429b-6449-44a5-b0e8-326f465590ab)

https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/engine/arg_utils.py#L611
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants