fix the issue missing dependencies in the Dockerfile and pip #2240

ColorfulDick · 2024-08-06T04:00:47Z

Motivation

fix issue of Dockerfile missing flash-attn、timm、nccl.

docker/Dockerfile

lvhan028 · 2024-08-06T04:08:40Z

requirements/serve.txt

@@ -1,3 +1,6 @@
 gradio
 protobuf
 tritonclient[grpc]
+timm


Please don't add them in serve.txt.
They are dependencies of VLMs models, not LLMs. What's more, they are not common dependencies among VLM models.

lvhan028 · 2024-08-06T04:09:00Z

requirements/serve.txt

@@ -1,3 +1,6 @@
 gradio
 protobuf
 tritonclient[grpc]
+timm
+flash-attn
+openai


We can accept openai

Let's put openai to runtime.txt

@ColorfulDick THIS comment is not resolved.

lvhan028 · 2024-08-06T04:10:45Z

docker-compose.yml

+services:
+  lmdeploy:
+    container_name: lmdeploy
+    image: openmmlab/lmdeploy-builder:cuda12.2


openmmlab/lmdeploy-builder is ONLY for building whl packages. It is not an image to deploy a model

i will change "openmmlab/lmdeploy-builder:cuda12.2" to "openmmlab/lmdeploy:latest"

docker/Dockerfile

requirements/serve.txt

lvhan028 · 2024-08-06T07:12:06Z

requirements/runtime.txt

@@ -18,3 +18,4 @@ torchvision<=0.18.1,>=0.15.0
 transformers
 triton>=2.1.0,<=2.3.1; sys_platform == "linux"
 uvicorn
+flash-attn


Not acceptable. Please remove it.
runtime.txt is restricted to the inference of LLM models

requirements/test.txt

docker/Dockerfile

…he minimum supported CUDA version if 12.2; if nvidia device version on host machine higher than images,torch in images will not work

lvhan028 · 2024-08-08T10:14:38Z

docker/InternVL_Dockerfile

@@ -0,0 +1,64 @@
+ARG CUDA_VERSION=cu12
+
+FROM nvidia/cuda:12.2.0-devel-ubuntu22.04 AS cu12


Please use openmmlab/lmdeploy:latest-cu11 and openmmlab/lmdeploy:latest-cu12, respectively, so that we don't need to build lmdeploy and its dependencies any more.

Please use openmmlab/lmdeploy:latest-cu11 and openmmlab/lmdeploy:latest-cu12, respectively, so that we don't need to build lmdeploy and its dependencies any more.

i have change the base image in InternVL_Dockerfile

requirements/test.txt

docker/InternVL_Dockerfile

lvhan028 · 2024-08-14T07:45:17Z

docker/Dockerfile

@@ -46,7 +54,8 @@ RUN cd /opt/lmdeploy &&\
    ninja -j$(nproc) && ninja install &&\
    cd .. &&\
    python3 -m pip install -e . &&\
-    rm -rf build
+    rm -rf build  &&\
+    rm -rf ~/.cache


why remove "~/cache" again

why remove "~/cache" again

for remove the pip cache and reduce image size;

Can't we keep only one "rm -rf ~/.cache"?

Can't we keep only one "rm -rf ~/.cache"?

bacause union file system in docker, every image level(every command RUN or files change in Dockerfile) is independent, Changes to the next level will not materially affect the files of the previous layer; if i only execute "rm -rf ~/.cache" at the fifth RUN will not delete the cache files of the second RUN;

docs/en/multi_modal/internvl.md

merge latest main and minor fix

docs/en/multi_modal/internvl.md

Pr 2240

RunningLeon · 2024-08-16T04:37:33Z

docker/Dockerfile

+RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb && \
+    dpkg -i cuda-keyring_1.0-1_all.deb && \
+    apt update -y && \
+    apt install libnccl2 libnccl-dev


nccl is already installed as you can see

docker run -it --rm nvidia/cuda:12.4.1-devel-ubuntu22.04

root@b40146dae1b8:/# apt list --installed | grep nccl WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libnccl-dev/now 2.21.5-1+cuda12.4 amd64 [installed,local] libnccl2/now 2.21.5-1+cuda12.4 amd64 [installed,local]

nccl is already installed as you can see

docker run -it --rm nvidia/cuda:12.4.1-devel-ubuntu22.04

root@b40146dae1b8:/# apt list --installed | grep nccl WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libnccl-dev/now 2.21.5-1+cuda12.4 amd64 [installed,local] libnccl2/now 2.21.5-1+cuda12.4 amd64 [installed,local]

if we don't reset nccl,there are some bug will happend, i have add envirment variable:

RUN export PATH=/usr/local/cuda/bin${PATH:+:${PATH}} && \ export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

but isn't useful:

#0 491.3 ../generate.sh: 6: [: unexpected operator #0 492.7 -- The CXX compiler identification is GNU 11.4.0 #0 496.0 -- The CUDA compiler identification is NVIDIA 12.2.140 #0 496.2 -- Detecting CXX compiler ABI info #0 496.4 -- Detecting CXX compiler ABI info - done #0 496.5 -- Check for working CXX compiler: /usr/bin/c++ - skipped #0 496.5 -- Detecting CXX compile features #0 496.5 -- Detecting CXX compile features - done #0 496.5 -- Detecting CUDA compiler ABI info #0 497.9 -- Detecting CUDA compiler ABI info - done #0 498.1 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped #0 498.1 -- Detecting CUDA compile features #0 498.1 -- Detecting CUDA compile features - done #0 498.1 CMake Warning (dev) at CMakeLists.txt:18 (find_package): #0 498.1 Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake #0 498.1 --help-policy CMP0146" for policy details. Use the cmake_policy command to #0 498.1 set the policy and suppress this warning. #0 498.1 #0 498.1 This warning is for project developers. Use -Wno-dev to suppress it. #0 498.1 #0 498.1 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD #0 498.3 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success #0 498.3 -- Found Threads: TRUE #0 498.3 -- Found CUDA: /usr/local/cuda (found suitable version "12.2", minimum required is "10.2") #0 498.4 CUDA_VERSION 12.2 is greater or equal than 11.0, enable -DENABLE_BF16 flag #0 498.4 -- Add DBUILD_MULTI_GPU, requires MPI and NCCL #0 499.0 -- Found MPI_CXX: /usr/local/openmpi/lib/libmpi.so (found version "3.1") #0 499.0 -- Found MPI: TRUE (found version "3.1") #0 499.0 CMake Error at /opt/py3/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:233 (message): #0 499.0 Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES) #0 499.0 Call Stack (most recent call first): #0 499.0 /opt/py3/lib/python3.10/site-packages/cmake/data/share/cmake-3.30/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE) #0 499.0 cmake/Modules/FindNCCL.cmake:126 (find_package_handle_standard_args) #0 499.0 CMakeLists.txt:87 (find_package) #0 499.0 #0 499.0 #0 499.0 -- Configuring incomplete, errors occurred!

@ColorfulDick
may be you can cp headers and libs to cuda directory as a workaround. Anyway, we cannot reproduce this and we may need to removing installing libnccl in dockerfile

cp /usr/include/nccl.h /usr/local/cuda/include/nccl.h cp /usr/lib/x86_64-linux-gnu/libnccl* /usr/local/cuda/lib64/

Following @RunningLeon instructions, nccl is already installed

/usr/lib/x86_64-linux-gnu/libnccl.so /usr/lib/x86_64-linux-gnu/libnccl_static.a /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/lib/x86_64-linux-gnu/libnccl.so.2.21.5

And find_package(NCCL) can be successful

I also pulled docker.io/nvidia/cuda:12.2.2-devel-ubuntu22.04, which also contains NCCL

/usr/lib/x86_64-linux-gnu/libnccl.so /usr/lib/x86_64-linux-gnu/libnccl_static.a /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/lib/x86_64-linux-gnu/libnccl.so.2.19.3

@ColorfulDick may be you can cp headers and libs to cuda directory as a workaround. Anyway, we cannot reproduce this and we may need to removing installing libnccl in dockerfile

cp /usr/include/nccl.h /usr/local/cuda/include/nccl.h cp /usr/lib/x86_64-linux-gnu/libnccl* /usr/local/cuda/lib64/

i know what happned now, some tag of image nvidia/cuda have never had nccl; for example:

docker run nvidia/cuda:12.2.0-devel-ubuntu22.04 find / -name "nccl.h"

it can't find anything;
but when i run tag 12.4.1-devel-ubuntu22.04:

docker run nvidia/cuda:12.4.1-devel-ubuntu22.04 find / -name "nccl.h"

it will get:

/usr/include/nccl.h

so i suggest add nccl installing in Dockerfile, because we don't know if it have nccl in future version of image nvidia/cuda;

I don't think that's a good idea.
On one hand, it might cause a version mismatch for the docker images that have already pre-installed NCCL.
On the other, it burdens our maintenance.

If users' cuda version is lower than 12.4, we suggest they use the cu11 docker image.
If users decide to change the base image, like cu12.2.0, I believe they can maintain it on their own

I don't think that's a good idea. On one hand, it might cause a version mismatch for the docker images that have already pre-installed NCCL. On the other, it burdens our maintenance.

If users' cuda version is lower than 12.4, we suggest they use the cu11 docker image. If users decide to change the base image, like cu12.2.0, I believe they can maintain it on their own

i remove the nccl installing in Dockerfile;

but i suggest to change base image tag to nvidia/cuda:12.0.0-devel-ubuntu22.04 that will support all cuda12 subverision on host machine;

because some pip package like flash-attn will depnend on cuda runtime with specific version,if a flash-attn package was complier install in cuda11 image,it will depend on libcudart.so.11.0 of cuda runtime in host machine ,if cuda version is 12 the host machine only have libcudart.so.12.0 and flash-attn will not work successful.

all in all,if image's cudatoolkit version is cuda11,some package will not work in host machine which install cuda12;

remove nccl installation since it is already in the docker image

docker/Dockerfile

RunningLeon

LGTM

fix the issue missing dependencies in the Dockerfile and pip

4092427

lvhan028 reviewed Aug 6, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

lvhan028 reviewed Aug 6, 2024

View reviewed changes

lvhan028 requested a review from RunningLeon August 6, 2024 06:32

RunningLeon reviewed Aug 6, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

RunningLeon reviewed Aug 6, 2024

View reviewed changes

requirements/serve.txt Outdated Show resolved Hide resolved

ColorfulDick added 2 commits August 6, 2024 15:08

reset dependencies

9bb5dea

reset compose images tag

d825281

lvhan028 reviewed Aug 6, 2024

View reviewed changes

requirements/test.txt Outdated Show resolved Hide resolved

add InternVL_Dockerfile

1a61662

RunningLeon reviewed Aug 6, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

RunningLeon reviewed Aug 6, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

RunningLeon reviewed Aug 6, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

RunningLeon reviewed Aug 6, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

ColorfulDick added 2 commits August 6, 2024 16:56

nvidia/cuda image should be as low as possible,now as of nccl2.22.3,t…

42b0bdc

…he minimum supported CUDA version if 12.2; if nvidia device version on host machine higher than images,torch in images will not work

remove a line

c67dedc

ColorfulDick requested a review from RunningLeon August 6, 2024 10:17

ColorfulDick added 2 commits August 7, 2024 10:10

fix the apt error in InternVL_Dockerfile

91314b9

apt add -y

9cd9c49

ColorfulDick requested a review from lvhan028 August 8, 2024 09:08

lvhan028 reviewed Aug 8, 2024

View reviewed changes

requirements/test.txt Outdated Show resolved Hide resolved

ColorfulDick added 2 commits August 8, 2024 18:38

change InternVL_Dockerfile base image tag

c2cc968

roll back .\requirements\test.txt to 7c4e75b

a63259e

ColorfulDick requested a review from lvhan028 August 8, 2024 14:54

lvhan028 added the Bug:P1 label Aug 8, 2024

RunningLeon reviewed Aug 9, 2024

View reviewed changes

docker/InternVL_Dockerfile Show resolved Hide resolved

lvhan028 reviewed Aug 14, 2024

View reviewed changes

fix docs about InternVL

d94b75e

ColorfulDick requested a review from lvhan028 August 14, 2024 08:02

RunningLeon reviewed Aug 14, 2024

View reviewed changes

docs/en/multi_modal/internvl.md Outdated Show resolved Hide resolved

ColorfulDick added 2 commits August 14, 2024 17:03

fix en docs about internVL markdown format

07ffc22

fix en docs about internVL markdown format

a202595

ColorfulDick requested a review from RunningLeon August 14, 2024 09:04

lvhan028 and others added 5 commits August 15, 2024 14:22

merge main

e2f6a8f

update

42dfb6c

update docs of internVL by H.Lyu

a702f4d

merge

627884f

Merge pull request #1 from lvhan028/PR-2240

3d78111

merge latest main and minor fix

lvhan028 approved these changes Aug 16, 2024

View reviewed changes

RunningLeon reviewed Aug 16, 2024

View reviewed changes

docs/en/multi_modal/internvl.md Outdated Show resolved Hide resolved

lvhan028 and others added 3 commits August 16, 2024 11:49

fix supported inference engine info for InternVL

dc32903

Merge branch 'origin-pr-2240' into PR-2240

8ab95a3

Merge pull request #2 from lvhan028/PR-2240

f6e2dee

Pr 2240

ColorfulDick requested review from RunningLeon and lvhan028 August 16, 2024 03:57

RunningLeon reviewed Aug 16, 2024

View reviewed changes

ColorfulDick requested a review from RunningLeon August 16, 2024 05:07

remove nccl installation since it is already in the docker image

ba7eafa

lvhan028 mentioned this pull request Aug 16, 2024

remove nccl installation since it is already in the docker image ColorfulDick/lmdeploy#3

Merged

ColorfulDick and others added 2 commits August 16, 2024 18:00

remove nccl install and chage base image tag

d957a17

Merge pull request #3 from lvhan028/PR-2240

d98211c

remove nccl installation since it is already in the docker image

RunningLeon reviewed Aug 16, 2024

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

change base image to 12.4.1

2d83e94

lvhan028 approved these changes Aug 17, 2024

View reviewed changes

RunningLeon approved these changes Aug 19, 2024

View reviewed changes

lvhan028 merged commit 26c00ab into InternLM:main Aug 19, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the issue missing dependencies in the Dockerfile and pip #2240

fix the issue missing dependencies in the Dockerfile and pip #2240

ColorfulDick commented Aug 6, 2024

lvhan028 Aug 6, 2024

lvhan028 Aug 6, 2024

lvhan028 Aug 8, 2024

RunningLeon Aug 14, 2024

lvhan028 Aug 6, 2024

ColorfulDick Aug 6, 2024

lvhan028 Aug 6, 2024 •

edited

Loading

lvhan028 Aug 8, 2024 •

edited

Loading

ColorfulDick Aug 8, 2024

lvhan028 Aug 14, 2024

ColorfulDick Aug 14, 2024

lvhan028 Aug 14, 2024

ColorfulDick Aug 14, 2024

RunningLeon Aug 16, 2024

ColorfulDick Aug 16, 2024

RunningLeon Aug 16, 2024

lvhan028 Aug 16, 2024

ColorfulDick Aug 16, 2024

lvhan028 Aug 16, 2024

ColorfulDick Aug 16, 2024

RunningLeon left a comment

		@@ -0,0 +1,64 @@
		ARG CUDA_VERSION=cu12

		FROM nvidia/cuda:12.2.0-devel-ubuntu22.04 AS cu12

fix the issue missing dependencies in the Dockerfile and pip #2240

fix the issue missing dependencies in the Dockerfile and pip #2240

Conversation

ColorfulDick commented Aug 6, 2024

Motivation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvhan028 Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

lvhan028 Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RunningLeon left a comment

Choose a reason for hiding this comment

lvhan028 Aug 6, 2024 •

edited

Loading

lvhan028 Aug 8, 2024 •

edited

Loading