-
Notifications
You must be signed in to change notification settings - Fork 134
[deepseek_r1] General PP enabling #1240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scripts/benchmark_server_param.sh
Outdated
|
|
||
| if [ "$KV_CACHE_DTYPE" = "fp8_inc" ]; then | ||
| export VLLM_USE_FP8_MATMUL="true" | ||
| export VLLM_USE_SINGLE_TENSOR_CACHE="1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need set "VLLM_USE_FP8_MATMUL" by application? shouldn't INC handle it?
And what's VLLM_USE_SINGLE_TENSOR_CACHE for? Can't find it in the vllm code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good cache. This was a leftover from Liu, Yi's FP8 matmul implementation on the PRC local version of this branch. I see that the implementation that is provided on deepseek_r1 branch is different, though and no longer uses VLLM_USE_SINGLE_TENSOR_CACHE.
We still need VLLM_USE_FP8_MATMUL though, right? This environment variable should be set for improved performance according to @yiliu30.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the VLLM_USE_SINGLE_TENSOR_CACHE flag should be removed, as we have been using one tensor for KVCache since #977.
We still need VLLM_USE_FP8_MATMUL for FP8 Q@K and FP8 A@V, since the current path does not use the INC to replace the Matmul with PatchedMatmul. Instead, it manually patches Q@K and A@V and use 1.0 as the scaling. Please refer to #977 for more details.
cc @xuechendi
7cd244f to
2f9b8b1
Compare
d8ef146 to
05e298d
Compare
Co-authored-by: Hu, Yabai <yabai.hu@intel.com> Co-authored-by: Ji, Kunshang <kunshang.ji@intel.com> Co-authored-by: Sheng, Yi <yi.sheng@intel.com> Co-authored-by: Chen, Xinyu <xinyu1.chen@intel.com> Co-authored-by: Voas, Tanner <tanner.voas@intel.com> Signed-off-by: Voas, Tanner <tanner.voas@intel.com>
05e298d to
ba9bf97
Compare
Additional validation is being done by yabai.hu@intel.com.
@czhu15 youlei.yang@intel.com please help start the review in the meantime.