-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Minor fix] Include flash_attn in docker image #3254
Conversation
Do we really need this fix? It seems our CI successfully builds the image and runs vLLM with the main branch. |
@WoosukKwon Just looking at the final stage of the Dockerfile:
I can't see how |
To double-check I have re-built image with no caching:
Then I try to import the flash attention backend inside the container:
and confirm that the
|
It's also possible the CI wouldn't catch this if it is running on older GPUs (e.g. V100) since the import that fails only happens if newer GPU (e.g., ampere) is detected. |
I am closing this, since it is no longer relevant now that #3269 has removed the flash attention dependency. |
Supports #3255.
To resolve it, we just need to make sure we copy the contents of
thirdparty_files
into the image.