-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AVX only CPU please #2266
Comments
In the ongoing release of 0.12, we have split the llama.cpp binary distribution. For more details, please refer to the change log at https://github.com/TabbyML/tabby/blob/main/.changes/unreleased/Fixed%20and%20Improvements-20240527-191452.yaml. This will provide users with the flexibility to build the llama.cpp server binary in a configuration that suits their preferences, whether with or without AVX/AVX2 support. |
Oh well! Is this already testable with the main branch? (Without the webserver option as the UI is not working yet) |
It's testable with https://github.com/TabbyML/tabby/releases/tag/v0.12.0-rc.1 (and corresponding docker image tag). Some documentation is still lacking - but pasting a configuration example to glance:
|
I have some problems with the 0.12.0-rc.1 image at this time...
Maybe I didn't understand how to start the application. |
You actually helps identify a bug introduced in 0.12, thank you :) fixing in #2279 For a workaround - you can create directory |
Well I created the "repositories" directory. The container is started but no log... I can see that llama-server is started but nothing happen. The :8080 doesn't respond yet. The directory is currently empty.
(yes I use podman, but trust me: CUDA device is available, the model is well downloaded, I already made plenty of things using it) I don't know where you want me to put the given yaml content. Excuse me, TabbyML is a bit complex to configure from external view. |
That's what we expect when jumping into the hole of RC :) not much thing is documented atm. Will update the thread once we finished the release and revamp the docs |
Yep, anyway, 29 minutes later, Tabby container is still starting without to respond. I will wait for the your updates. A pitty because I need it to propose Tabby to the company where I work. I wanted to make a CUDA enabled demo. |
Ho ! OK, I see the problem... In the container, llama-server is not compatible to AVX. If I go inside, launching it manually says "Illegal instruction" (like when I didn't compile my own) So... I need to:
right ? |
Right - you shall be able to:
|
OK, it seems to be near to work. Tabby connects the server, and the swagger UI has no error when I try. Now... the problem is that it doesn't give any completion 😄 I will investigate, maybe the Thanks for all! |
Note the prompt template is language specific - please refer https://github.com/TabbyML/registry-tabby/blob/main/models.json for corresponding FIM template |
I can confirm:
Using 3B model is OK, the completion takes 1s, sometimes less. I cannot use But, it works. I haven't tried all models, StarCoder2-3B seems to fail, while StarCoder-3B is OK. But, that's a nice approach to be able to start my own LLamaCPP server using the driver I want (OpenCL here, AVX support) Thanks a lot for all: to create Tabby, for your help and to make us able to make it work on "old" computers 👍🏻 PS: It could be a good idea to provide an image without llama server inside, to make the image light weight for those like me who starts llamacpp server outside |
I can close the issue. It's now a problem with the server output. |
it would be cool if there were 3 compiled builds of llama server: with AVX/AVX2, only AVX and no AVX/2. Not all cpus support these instructions |
Please describe the feature you want
On my "old" computer (having core i7 + RTX 3070 FE + 32Go RAM) I cannot launch any of the docker container or the binary.
My CPU doesn't support AVX2, it only accepts AVX.
I think it's the problem as I already had this error (
invlaid instruction
) when I tried LM Studio.Using llama.cpp with clblast for some other tests (outside of TabbyML) worked. For example, llama.cpp server (python binding) is OK with Mistral and Llama2 models.
On my other computer, with the same Fedora version, it works (laptop with RTX 3060). This other computer has AVX2 support.
Additional context
I tried to compile myself the Docker image, the problem remains. I'm not sure if AVX is correctly set.
EDIT:
Let me rephrase.
LM Studio doesn't offer a binary that supports AVX, only AVX2. On this machine, I have exactly the same problem with TabblyML.
When I compiled llama.cpp (the python binding), AVX instructions are supported. This means I can run llama.cpp server and use GUFF models.
So I'm sure llama.cpp will work, provided the compilation options aren't restricted to AVX2.
However, with TabblyML, I get this error with binaries, the official Docker image, and the image I build myself. So I “think” that TabbyML disables the AVX option in favor of AVX2.
What I'm asking is if there's an option somewhere for me to force the use of AVX.
Please reply with a 👍 if you want this feature.
The text was updated successfully, but these errors were encountered: