-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q:how to run cli via cuda in docker container #229
Comments
I'm trying to figure out what you are actually looking for. Are you trying to build or run a model? Those are two different requests. The screenshot indicates that you are trying to run a prebuilt model with Vulkan, but as I could possibly infer from your question "a way to use cuda directly without exchanging driver inside", there is no Vulkan or CUDA inside your docker image, so it's impossible to run a CUDA/Vulkan code without the driver, right? Instead if you are trying to build the model, it is definitely possible in MLC LLM, meaning as long as you have the compiler toolchain (e.g. nvcc), you can build a model that could be run elsewhere. |
@junrushao yeah, i found that there is mistake in what i describe, i didn't specify the target in prebuilt model,and didn't make USE_CUDA=ON in compile mlc_llm |
for all cuda users, content as blow is how i compile mlc_llm // compile tvm git clone https://github.com/mlc-ai/mlc-llm.git --recursive // compile mlc-llm then run the mlc-chat-cli, set --device-name cuda will work |
… runtime to improve perf (mlc-ai#229) Use custom nd to scalar absolute max reduce kernel in max calibration runtime to improve perfomance. Co-authored-by: Chris Sullivan <csullivan@octo.ai>
im using docker container (ubuntu22.04 cuda 11.8, without vulkan), graphics card:A100 dirver:470.42 cuda:11.4

python3 build.py --model path/to/vicuna-v1-7b --quantization q3f16_0 --max-seq-len 768
only output vicuna-v1-7b-q3f16_0-cpu.so
i tried to install vicuna-v1-7b-q3f16_0-vulkan.so from https://github.com/mlc-ai/binary-mlc-llm-libs
error log is
is there a way to use cuda directly without exchanging driver inside container, or supporting a dockerfile
The text was updated successfully, but these errors were encountered: