[Minor fix] Include flash_attn in docker image #3254

tdoublep · 2024-03-07T10:54:19Z

Supports #3255.

To resolve it, we just need to make sure we copy the contents of thirdparty_files into the image.

WoosukKwon · 2024-03-07T11:00:32Z

Do we really need this fix? It seems our CI successfully builds the image and runs vLLM with the main branch.

tdoublep · 2024-03-07T11:08:04Z

@WoosukKwon Just looking at the final stage of the Dockerfile:

#################### OPENAI API SERVER ####################
# openai api server alternative
FROM vllm-base AS vllm-openai
# install additional dependencies for openai api server
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install accelerate

COPY --from=build /workspace/vllm/*.so /workspace/vllm/
COPY vllm vllm

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

I can't see how thirdparty_files would end up in there, unless somehow one has already built vllm outside of the docker build context (which would cause COPY vllm vllm to pull it in). Maybe the CI is doing something like that?

tdoublep · 2024-03-07T11:55:23Z

To double-check I have re-built image with no caching:

$ git log -n1
commit 2daf23ab0cf00da157b1255faddcf0a269283d36 (HEAD -> main, vllm/main)
Author: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Date:   Thu Mar 7 01:45:50 2024 -0800

    Separate attention backends (#3005)

$ docker build . -t vllm-main --build-arg="max_jobs=40" --no-cache                                                                                   
[+] Building 618.6s (30/30) FINISHED                                                                                                                                       
 => [internal] load .dockerignore                                                                                                                                     0.0s 
 => => transferring context: 50B                                                                                                                                      0.0s 
 => [internal] load build definition from Dockerfile                                                                                                                  0.0s 
 => => transferring dockerfile: 3.67kB                                                                                                                                0.0s 
 => [internal] load metadata for docker.io/nvidia/cuda:12.1.0-runtime-ubuntu22.04                                                                                     0.4s 
 => [internal] load metadata for docker.io/nvidia/cuda:12.1.0-devel-ubuntu22.04                                                                                       0.4s 
 => [internal] load build context                                                                                                                                     0.0s 
 => => transferring context: 11.92kB                                                                                                                                  0.0s
 => CACHED [vllm-base 1/5] FROM docker.io/nvidia/cuda:12.1.0-runtime-ubuntu22.04@sha256:402700b179eb764da6d60d99fe106aa16c36874f7d7fb3e122251ff6aea8b2f7              0.0s
 => CACHED [dev 1/8] FROM docker.io/nvidia/cuda:12.1.0-devel-ubuntu22.04@sha256:e3a8f7b933e77ecee74731198a2a5483e965b585cea2660675cf4bb152237e9b                      0.0s
 => [vllm-base 2/5] RUN apt-get update -y     && apt-get install -y python3-pip                                                                                      34.5s
 => [dev 2/8] RUN apt-get update -y     && apt-get install -y python3-pip git                                                                                        18.5s
 => [dev 3/8] RUN ldconfig /usr/local/cuda-12.1/compat/                                                                                                               0.6s
 => [dev 4/8] WORKDIR /workspace                                                                                                                                      0.0s
 => [dev 5/8] COPY requirements.txt requirements.txt                                                                                                                  0.0s
 => [dev 6/8] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements.txt                                                                    153.0s
 => [vllm-base 3/5] WORKDIR /workspace                                                                                                                                0.0s
 => [vllm-base 4/5] COPY requirements.txt requirements.txt                                                                                                            0.0s
 => [vllm-base 5/5] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements.txt                                                              137.6s
 => [dev 7/8] COPY requirements-dev.txt requirements-dev.txt                                                                                                          0.0s 
 => [vllm-openai 1/3] RUN --mount=type=cache,target=/root/.cache/pip     pip install accelerate                                                                       3.0s 
 => [dev 8/8] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements-dev.txt                                                                 10.9s 
 => [build 1/8] COPY requirements-build.txt requirements-build.txt                                                                                                    0.0s 
 => [build 2/8] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements-build.txt                                                              2.2s 
 => [build 3/8] COPY csrc csrc                                                                                                                                        0.0s 
 => [build 4/8] COPY setup.py setup.py                                                                                                                                0.0s 
 => [build 5/8] COPY requirements.txt requirements.txt                                                                                                                0.0s 
 => [build 6/8] COPY pyproject.toml pyproject.toml                                                                                                                    0.0s 
 => [build 7/8] COPY vllm/__init__.py vllm/__init__.py                                                                                                                0.0s 
 => [build 8/8] RUN python3 setup.py build_ext --inplace                                                                                                            368.9s 
 => [vllm-openai 2/3] COPY --from=build /workspace/vllm/*.so /workspace/vllm/                                                                                         1.0s 
 => [vllm-openai 3/3] COPY vllm vllm                                                                                                                                  0.0s 
 => exporting to image                                                                                                                                               61.1s 
 => => exporting layers                                                                                                                                              61.1s 
 => => writing image sha256:2e2e352648123eaadfb0c4014dd1b691e880340e0505f11e78b6a2b5f0effc88                                                                          0.0s 
 => => naming to docker.io/library/vllm-main                                                                                                                          0.0s

Then I try to import the flash attention backend inside the container:

$ docker run -it --entrypoint python3 vllm-main -c "from vllm.model_executor.layers.attention.backends.flash_attn import FlashAttentionBackend"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/workspace/vllm/model_executor/layers/attention/backends/flash_attn.py", line 5, in <module>
    from flash_attn import flash_attn_func
ModuleNotFoundError: No module named 'flash_attn'

and confirm that the thirdparty_files dir indeed does not exist in the image:

$ docker run -it --entrypoint bash vllm-main -c "ls vllm/thirdparty_files"
ls: cannot access 'vllm/thirdparty_files': No such file or directory

tdoublep · 2024-03-07T14:52:58Z

It's also possible the CI wouldn't catch this if it is running on older GPUs (e.g. V100) since the import that fails only happens if newer GPU (e.g., ampere) is detected.

tdoublep · 2024-03-08T19:42:39Z

I am closing this, since it is no longer relevant now that #3269 has removed the flash attention dependency.

Include thirdparty_files in docker image

37ff7ec

tdoublep changed the title ~~Include flash_attn in docker image~~ [Minor fix] Include flash_attn in docker image Mar 7, 2024

tdoublep closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Minor fix] Include flash_attn in docker image #3254

[Minor fix] Include flash_attn in docker image #3254

tdoublep commented Mar 7, 2024 •

edited

Loading

WoosukKwon commented Mar 7, 2024 •

edited

Loading

tdoublep commented Mar 7, 2024

tdoublep commented Mar 7, 2024

tdoublep commented Mar 7, 2024

tdoublep commented Mar 8, 2024

[Minor fix] Include flash_attn in docker image #3254

[Minor fix] Include flash_attn in docker image #3254

Conversation

tdoublep commented Mar 7, 2024 • edited Loading

WoosukKwon commented Mar 7, 2024 • edited Loading

tdoublep commented Mar 7, 2024

tdoublep commented Mar 7, 2024

tdoublep commented Mar 7, 2024

tdoublep commented Mar 8, 2024

tdoublep commented Mar 7, 2024 •

edited

Loading

WoosukKwon commented Mar 7, 2024 •

edited

Loading