-
Notifications
You must be signed in to change notification settings - Fork 7
Error running ghcr.io/evshiron/rocm_lab:rocm5.5-text-gen-webui 7dea7110f293 #13
Comments
Oops, I posted before adding the details. |
I built a new docker container using the dockerfile/rocm5.5-text-gen-webui and, after some minor modification, it built a docker container that works. |
Compared to CUDA version, ROCm version does not show how many layers are offloaded to GPU (i.e. n-gpu-layers does not seem to do anything.) Does this version always use GPU? ROCm output
CUDA output
I was so happy to see ROCm version running so much faster. But it turned out that I was not offloading anything to GPU in my NVidia setup. :( I know this isn't ROCm_lab specific questions. So, if anyone can give me a pointer to a forum where I can ask text-generation-webui general questions, I'd appreciate it. |
Greetings. Here is a tutorial which was updated recently: It seems that ROCm support for llama.cpp has been merged last week, but The tutorial covers HuggingFace and GPTQ usages. GPTQ is a decent quantization solution for LLMs, and you can obtain many quantized models from: If you definitely want to use GGML at the moment, you might have to do it yourself as I am too busy to update the tutorial currently. |
I'm new to generative AI stuff and my ultimate goal is to find out how AMD compares to NVidia in all things AI. So, I'm willing to try anything. Thanks for the pointer. I'll go through your tutorial. Hopefully, I can get up to speed fast and send you a pull request when I update your stuff with ROCm 5.6.1! |
Thank you for your kindness. As far as I know, ROCm 5.6.1 is a minor release for some bug fixes, and the articles written for ROCm 5.6 should work too. As a result those articles haven't been updated for ROCm 5.6.1. For the best of LLM performance on AMD GPUs, here is an abnormal solution: Which achieves 80% performance of a RTX 4090 on a RX 7900 XTX. The potential of Navi 3x is here, but most HIP code ported from CUDA cannot fully unleash the performance of AMD GPUs. |
@evshiron which performance do you have in token/s with a 70B q4 model ? |
A 70B q4 model should be around 35GB in size. I haven't tried it as I only have one RX 7900 XTX. But if you're referring to offloading to GPU using llama.cpp, I am also interested in it. I might try it out later. UPDATE: Build: git clone https://github.com/ggerganov/llama.cpp
make LLAMA_HIPBLAS=1 Command: ./main -t 8 -m llama-2-70b-chat.q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\nWrite a story about llamas[/INST]" Hardware: AMD 7800X3D + AMD RX 7900 XTX Model: https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF/blob/main/llama-2-70b-chat.Q5_K_M.gguf Without
With
For comparison, with 7B q4 model, the performance with and without |
@evshiron So, I'm now starting over again. My setup is Ubuntu 22.04 VM running in Proxmox server with 7900XTX in PCIe passthrough mode. Since I was able to run TensorFlow properly, I'm pretty sure the hardware and main OS is set up properly. I did all my testing using various docker containers and TF seems to run correctly. PT however is giving me a lot of trouble. So, to start from scratch, I ran your docker image using the following command
Inside the docker, I did
Then, I ran mnist.py from https://gist.github.com/AlkindiX/9c54d1155ba72415f3b585e26c9df6b3 and got this result https://gist.github.com/briansp2020/bbde07808cc360992721ccc16692047a
Not sure what is wrong. |
Please don't use the Docker images in this repo. The PyTorch included is outdated and functionalities are missing. Set up a new venv, install PyTorch using the command at the very beginning of the repo's README, then try again. You may just follow the instruction in the Gist you linked. |
I moved my 7900XTX to my main machine and pytorch now works. I don't know whether it's the fact that I was running it under Proxmox in a VM or using a different motherboard. I'm now running on a bare metal Ubuntu 22.04. It's weird since TensorFlow seemed fine in that setup. Anyway, thanks for your help. I'll try the text gen stuff and report soon. |
As far as I know ROCm requires PCIe Atomics, so I suspect Proxmox's passthrough doesn't support that or require additional configuration. To be honest, I haven't tried GPU virtualization, nor have I attempted to run ROCm in a virtualized environment. |
I'm closing this since I'm no longer working with this docker and have used your guide to run models. |
The text was updated successfully, but these errors were encountered: