-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Error when using tensor_parallel in v0.6.1 #8397
Comments
Same here with Docker when setting
|
Can you check whether #8390 fixes the problem on your end? |
Yes, it absolutly fixes what goes wrong, thanks a lot! |
same on me. |
I see this no just with AMD, on normal nvidia H100*4 |
I think this requires 0.6.2 or 0.6.1.post release, without the fix it seems any sharding is broken on any devices. |
Yes, we will likely release a patch since this issue breaks vLLM for many users. Stay tuned! |
I know it's not helpful but the fixes worked for me. Installed from source and everything was up and running again. |
still has this problem when using 0.6.1.post or 0.6.2 when setting tp=2 on single node with 8 gpus. The pp is 1. |
Can you open a new issue and provide your environment and error in more details? |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
When using the vllm library version
v0.6.1
, I encounter an error with tensor_parallel. Rolling back tov0.6.0
resolves the issue.tensor_parallel_size = 1
won't see the bug at all.The test code is the same for both versions:
The output for v0.6.0 is as follows:
The output for v0.6.1 is as follows:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: