Q:how to run cli via cuda in docker container #229

dengzheng-cloud · 2023-05-25T06:52:09Z

im using docker container (ubuntu22.04 cuda 11.8, without vulkan), graphics card:A100 dirver:470.42 cuda:11.4
python3 build.py --model path/to/vicuna-v1-7b --quantization q3f16_0 --max-seq-len 768
only output vicuna-v1-7b-q3f16_0-cpu.so
i tried to install vicuna-v1-7b-q3f16_0-vulkan.so from https://github.com/mlc-ai/binary-mlc-llm-libs
error log is

is there a way to use cuda directly without exchanging driver inside container, or supporting a dockerfile

junrushao · 2023-05-26T02:26:15Z

I'm trying to figure out what you are actually looking for. Are you trying to build or run a model? Those are two different requests.

The screenshot indicates that you are trying to run a prebuilt model with Vulkan, but as I could possibly infer from your question "a way to use cuda directly without exchanging driver inside", there is no Vulkan or CUDA inside your docker image, so it's impossible to run a CUDA/Vulkan code without the driver, right?

Instead if you are trying to build the model, it is definitely possible in MLC LLM, meaning as long as you have the compiler toolchain (e.g. nvcc), you can build a model that could be run elsewhere.

dengzheng-cloud · 2023-05-26T09:41:29Z

@junrushao yeah, i found that there is mistake in what i describe, i didn't specify the target in prebuilt model,and didn't make USE_CUDA=ON in compile mlc_llm

dengzheng-cloud · 2023-05-26T10:02:49Z

for all cuda users, content as blow is how i compile mlc_llm

// compile tvm
git clone https://github.com/mlc-ai/relax.git --recursive
cd relax
mkdir build
cp cmake/config.cmake build
// modify USE_CUDA=ON, USE_CUDNN=ON, USE_CUBLAS=ON
make -j
export TVM_HOME=/path/to/relax
export PYTHONPATH=$PYTHONPATH:$TVM_HOME/python
// here i use my local vicuna-v1-7b

git clone https://github.com/mlc-ai/mlc-llm.git --recursive
cd mlc-llm
// prebuild model get done here
python3 build.py --model path/o/vicuna-v1-7b --quantization q4f16_0 --target cuda --max-seq-len 768

// compile mlc-llm
mkdir build && cd build
cmake .. -DUSE_CUDA=ON
make
cd ..

then run the mlc-chat-cli, set --device-name cuda will work
i have not verified these again, if something go wrong, please comment, if i know, i will reply
#119 did help a lot.

… runtime to improve perf (mlc-ai#229) Use custom nd to scalar absolute max reduce kernel in max calibration runtime to improve perfomance. Co-authored-by: Chris Sullivan <csullivan@octo.ai>

junrushao added the question Question about the usage label May 26, 2023

dengzheng-cloud closed this as completed May 26, 2023

lhl mentioned this issue Aug 3, 2023

[Survey] Supported Hardwares and Speed #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q:how to run cli via cuda in docker container #229

Q:how to run cli via cuda in docker container #229

dengzheng-cloud commented May 25, 2023

junrushao commented May 26, 2023 •

edited

Loading

dengzheng-cloud commented May 26, 2023

dengzheng-cloud commented May 26, 2023 •

edited

Loading

Q:how to run cli via cuda in docker container #229

Q:how to run cli via cuda in docker container #229

Comments

dengzheng-cloud commented May 25, 2023

junrushao commented May 26, 2023 • edited Loading

dengzheng-cloud commented May 26, 2023

dengzheng-cloud commented May 26, 2023 • edited Loading

junrushao commented May 26, 2023 •

edited

Loading

dengzheng-cloud commented May 26, 2023 •

edited

Loading