Queston about Qwen2.5-Turbo-1M Speed? #1116

cizhenshi · 2024-12-03T08:32:24Z

cizhenshi
Dec 3, 2024

https://qwenlm.github.io/blog/qwen2.5-turbo/

I noticed in the blog that Qwen2.5-7B's TTFT on an A100 is reported to be around 200 seconds in the full attention setting. However, when I deployed Qwen2.5-7B-1M using vllm 0.6.4.post1, the TTFT was approximately 10 minutes. Is the Qwen2.5-7B mentioned in the blog the same as the open-source Qwen2.5-7B? What inference framework was used? Could the version of vllm I am using be causing the issue?

jklj077 · 2024-12-03T11:37:22Z

jklj077
Dec 3, 2024
Maintainer

ping @hzhwcmhf

2 replies

cyLi-Tiger Dec 3, 2024

@cizhenshi Thank you for your interest in Qwen2.5-Turbo-1M!

Regarding performance, the TTFT of Qwen2.5-7B with full attention on an input length of 1 million tokens is 196 seconds. This benchmark was achieved using vllm v0.6.2 on 8 A100 GPUs. The model architecture is identical to the open-sourced Qwen2.5-7B, but the weights are distinct.

cizhenshi Dec 3, 2024
Author

@cizhenshi Thank you for your interest in Qwen2.5-Turbo-1M!

Regarding performance, the TTFT of Qwen2.5-7B with full attention on an input length of 1 million tokens is 196 seconds. This benchmark was achieved using vllm v0.6.2 on 8 A100 GPUs. The model architecture is identical to the open-sourced Qwen2.5-7B, but the weights are distinct.

Thanks for your reply!
I am trying to use VLLM to deploy a 7B model across 8 A800 GPUs, but the number of heads (28) cannot be evenly divided by 8, which is causing deployment issues. How can I avoid this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queston about Qwen2.5-Turbo-1M Speed? #1116

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Queston about Qwen2.5-Turbo-1M Speed? #1116

cizhenshi Dec 3, 2024

Replies: 1 comment · 2 replies

jklj077 Dec 3, 2024 Maintainer

cyLi-Tiger Dec 3, 2024

cizhenshi Dec 3, 2024 Author

cizhenshi
Dec 3, 2024

Replies: 1 comment 2 replies

jklj077
Dec 3, 2024
Maintainer

cizhenshi Dec 3, 2024
Author