Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine configuration for attention_xpu.cpp #59

Draft
wants to merge 1 commit into
base: 0.6.2
Choose a base branch
from

Conversation

xiangyuT
Copy link
Collaborator

@xiangyuT xiangyuT commented Nov 28, 2024

vLLM 2xARC770 TP2 Qwen2.5-14B-Instruct FP8 1 30.52826 423.0261 31.9921 0.983465
  2.2.0-b7       2 56.4111 778.8015 33.99859 0.9812
          4 101.9503 1153.626 37.0522 0.968432
          6 136.486 1535.058 41.04005 0.963608
          8 162.062 1937.625 45.66564 0.968929
          10 206.3586 2326.798 43.99728 0.964007
          12 232.5044 2720.045 46.38537 0.967773
          14 254.09 3107.151 49.12045 0.962014
          16 274.5366 3492.905 51.55335 0.963794
          18 243.6606 3880.514 66.41701 0.972004
          20 259.8757 4263.002 68.76075 0.974224
          22 273.293 4652.212 71.54498 0.973799
          24 287.8988 5033.827 73.66682 0.975461

@xiangyuT xiangyuT marked this pull request as draft December 2, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant