Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AVX only CPU please #2266

Closed
metal3d opened this issue May 28, 2024 · 15 comments
Closed

Support AVX only CPU please #2266

metal3d opened this issue May 28, 2024 · 15 comments
Labels

Comments

@metal3d
Copy link

metal3d commented May 28, 2024

Please describe the feature you want
On my "old" computer (having core i7 + RTX 3070 FE + 32Go RAM) I cannot launch any of the docker container or the binary.

My CPU doesn't support AVX2, it only accepts AVX.

I think it's the problem as I already had this error (invlaid instruction) when I tried LM Studio.

Using llama.cpp with clblast for some other tests (outside of TabbyML) worked. For example, llama.cpp server (python binding) is OK with Mistral and Llama2 models.

On my other computer, with the same Fedora version, it works (laptop with RTX 3060). This other computer has AVX2 support.

Additional context
I tried to compile myself the Docker image, the problem remains. I'm not sure if AVX is correctly set.

EDIT:

Let me rephrase.

LM Studio doesn't offer a binary that supports AVX, only AVX2. On this machine, I have exactly the same problem with TabblyML.

When I compiled llama.cpp (the python binding), AVX instructions are supported. This means I can run llama.cpp server and use GUFF models.

So I'm sure llama.cpp will work, provided the compilation options aren't restricted to AVX2.

However, with TabblyML, I get this error with binaries, the official Docker image, and the image I build myself. So I “think” that TabbyML disables the AVX option in favor of AVX2.

What I'm asking is if there's an option somewhere for me to force the use of AVX.


Please reply with a 👍 if you want this feature.

@metal3d metal3d added the enhancement New feature or request label May 28, 2024
@wsxiaoys
Copy link
Member

In the ongoing release of 0.12, we have split the llama.cpp binary distribution. For more details, please refer to the change log at https://github.com/TabbyML/tabby/blob/main/.changes/unreleased/Fixed%20and%20Improvements-20240527-191452.yaml.

This will provide users with the flexibility to build the llama.cpp server binary in a configuration that suits their preferences, whether with or without AVX/AVX2 support.

@metal3d
Copy link
Author

metal3d commented May 29, 2024

Oh well!

Is this already testable with the main branch? (Without the webserver option as the UI is not working yet)

@wsxiaoys
Copy link
Member

It's testable with https://github.com/TabbyML/tabby/releases/tag/v0.12.0-rc.1 (and corresponding docker image tag).

Some documentation is still lacking - but pasting a configuration example to glance:

[model.completion.http]
kind = "llama.cpp/completion"
api_endpoint = "https://..."
prompt_template = "<|fim▁begin|>{prefix}<|fim▁hole|>{suffix}<|fim▁end|>"

@metal3d
Copy link
Author

metal3d commented May 29, 2024

I have some problems with the 0.12.0-rc.1 image at this time...

2024-05-29T08:00:00.190389Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:170: Started cleaning up 'source_files' bucket
2024-05-29T08:00:00.190734Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:195: Finished garbage collection for 'source_files': 0 items kept, 0 items removed
2024-05-29T08:00:00.266167Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:110: Started cleaning up 'indexed_files' bucket
2024-05-29T08:00:00.266226Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:133: Finished garbage collection for 'indexed_files': 0 items kept, 0 items removed
The application panicked (crashed).
Message:  Failed to read repository dir: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Location: crates/tabby-scheduler/src/code/repository.rs:82

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Maybe I didn't understand how to start the application.

@wsxiaoys
Copy link
Member

wsxiaoys commented May 29, 2024

You actually helps identify a bug introduced in 0.12, thank you :)

fixing in #2279

For a workaround - you can create directory ~/.tabby/repositories

@metal3d
Copy link
Author

metal3d commented May 29, 2024

Well I created the "repositories" directory. The container is started but no log... I can see that llama-server is started but nothing happen. The :8080 doesn't respond yet.

The directory is currently empty.

 ps fax | grep tabby
  52962 pts/2    Sl+    0:01  |   \_ podman run --device=nvidia.com/gpu=all --security-opt=label=disable --rm -it -p 0.0.0.0:8080:8080 -v /home/metal3d/.tabby:/data:z tabbyml/tabby:0.12.0-rc.1 serve --model StarCoder-1B --no-webserver --device cuda
 141093 pts/3    S+     0:00  |   \_ grep --color=auto tabby
  91207 pts/4    S+     0:00      \_ watch ls -lah /home/metal3d/.tabby/repositories/
  52991 pts/0    Ssl+  18:59  \_ /opt/tabby/bin/tabby serve --model StarCoder-1B --no-webserver --device cuda
 141074 pts/0    S+     0:00      \_ /opt/tabby/bin/llama-server -m /data/models/TabbyML/StarCoder-1B/ggml/model.gguf --cont-batching --port 30888 -np 1 --log-disable --ctx-size 4096 -ngl 9999

(yes I use podman, but trust me: CUDA device is available, the model is well downloaded, I already made plenty of things using it)

I don't know where you want me to put the given yaml content.

Excuse me, TabbyML is a bit complex to configure from external view.

@wsxiaoys
Copy link
Member

That's what we expect when jumping into the hole of RC :) not much thing is documented atm.

Will update the thread once we finished the release and revamp the docs

@metal3d
Copy link
Author

metal3d commented May 29, 2024

Yep, anyway, 29 minutes later, Tabby container is still starting without to respond. I will wait for the your updates.

A pitty because I need it to propose Tabby to the company where I work. I wanted to make a CUDA enabled demo.

@metal3d
Copy link
Author

metal3d commented May 29, 2024

Ho ! OK, I see the problem... In the container, llama-server is not compatible to AVX. If I go inside, launching it manually says "Illegal instruction" (like when I didn't compile my own)

So... I need to:

  • build llamma-server
  • rebuild the docker image with MY llama-cpp

right ?

@wsxiaoys
Copy link
Member

wsxiaoys commented May 29, 2024

Right - you shall be able to:

  1. build llama-server directly from llama.cpp repository, with avx2 disabled.
  2. Start llama-server manually as individual http service.
  3. Connect tabby to the service with config.toml specified above (shall be put in ~/.tabby/config.toml, or /data/config.toml in contiainer.

@metal3d
Copy link
Author

metal3d commented May 29, 2024

OK, it seems to be near to work.

Tabby connects the server, and the swagger UI has no error when I try. Now... the problem is that it doesn't give any completion 😄

I will investigate, maybe the prompt_template isn't OK (I already replaced the pipe char that were bad in your comment).

Thanks for all!

@wsxiaoys
Copy link
Member

Note the prompt template is language specific - please refer https://github.com/TabbyML/registry-tabby/blob/main/models.json for corresponding FIM template

@metal3d
Copy link
Author

metal3d commented Jun 3, 2024

I can confirm:

  • compiling server from llama_cpp
  • starting the server with the command ./server -m ${model} --cont-batching --host 0.0.0.0 --port 8000 --ctx-size 4096 -ngl 28 (28 layers in GPU at this time)
  • configuring the prompt template and api_endpoint in the config file
  • starting tabbyml/tabby v0.12 rc.3 with --net host (I don't know why, using real IP fails, 127.0.0.1 is OK with host network)
  • configured the tabby agent file to hit the tabby service

Using 3B model is OK, the completion takes 1s, sometimes less.

I cannot use [[registries]] (I guess it's only for the non OSS version 😢 )

But, it works. I haven't tried all models, StarCoder2-3B seems to fail, while StarCoder-3B is OK.

But, that's a nice approach to be able to start my own LLamaCPP server using the driver I want (OpenCL here, AVX support)

Thanks a lot for all: to create Tabby, for your help and to make us able to make it work on "old" computers 👍🏻

PS: It could be a good idea to provide an image without llama server inside, to make the image light weight for those like me who starts llamacpp server outside

@metal3d
Copy link
Author

metal3d commented Jun 5, 2024

I can close the issue. It's now a problem with the server output.

@metal3d metal3d closed this as completed Jun 5, 2024
@KweezyCode
Copy link

it would be cool if there were 3 compiled builds of llama server: with AVX/AVX2, only AVX and no AVX/2. Not all cpus support these instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants