Support AVX only CPU please #2266

metal3d · 2024-05-28T15:39:31Z

Please describe the feature you want
On my "old" computer (having core i7 + RTX 3070 FE + 32Go RAM) I cannot launch any of the docker container or the binary.

My CPU doesn't support AVX2, it only accepts AVX.

I think it's the problem as I already had this error (invlaid instruction) when I tried LM Studio.

Using llama.cpp with clblast for some other tests (outside of TabbyML) worked. For example, llama.cpp server (python binding) is OK with Mistral and Llama2 models.

On my other computer, with the same Fedora version, it works (laptop with RTX 3060). This other computer has AVX2 support.

Additional context
I tried to compile myself the Docker image, the problem remains. I'm not sure if AVX is correctly set.

EDIT:

Let me rephrase.

LM Studio doesn't offer a binary that supports AVX, only AVX2. On this machine, I have exactly the same problem with TabblyML.

When I compiled llama.cpp (the python binding), AVX instructions are supported. This means I can run llama.cpp server and use GUFF models.

So I'm sure llama.cpp will work, provided the compilation options aren't restricted to AVX2.

However, with TabblyML, I get this error with binaries, the official Docker image, and the image I build myself. So I “think” that TabbyML disables the AVX option in favor of AVX2.

What I'm asking is if there's an option somewhere for me to force the use of AVX.

Please reply with a 👍 if you want this feature.

The text was updated successfully, but these errors were encountered:

wsxiaoys · 2024-05-28T22:11:37Z

In the ongoing release of 0.12, we have split the llama.cpp binary distribution. For more details, please refer to the change log at https://github.com/TabbyML/tabby/blob/main/.changes/unreleased/Fixed%20and%20Improvements-20240527-191452.yaml.

This will provide users with the flexibility to build the llama.cpp server binary in a configuration that suits their preferences, whether with or without AVX/AVX2 support.

metal3d · 2024-05-29T05:09:44Z

Oh well!

Is this already testable with the main branch? (Without the webserver option as the UI is not working yet)

wsxiaoys · 2024-05-29T07:23:36Z

It's testable with https://github.com/TabbyML/tabby/releases/tag/v0.12.0-rc.1 (and corresponding docker image tag).

Some documentation is still lacking - but pasting a configuration example to glance:

[model.completion.http]
kind = "llama.cpp/completion"
api_endpoint = "https://..."
prompt_template = "<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>"

metal3d · 2024-05-29T08:03:21Z

I have some problems with the 0.12.0-rc.1 image at this time...

2024-05-29T08:00:00.190389Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:170: Started cleaning up 'source_files' bucket
2024-05-29T08:00:00.190734Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:195: Finished garbage collection for 'source_files': 0 items kept, 0 items removed
2024-05-29T08:00:00.266167Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:110: Started cleaning up 'indexed_files' bucket
2024-05-29T08:00:00.266226Z  INFO tabby_scheduler::code::cache: crates/tabby-scheduler/src/code/cache.rs:133: Finished garbage collection for 'indexed_files': 0 items kept, 0 items removed
The application panicked (crashed).
Message:  Failed to read repository dir: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Location: crates/tabby-scheduler/src/code/repository.rs:82

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Maybe I didn't understand how to start the application.

wsxiaoys · 2024-05-29T09:32:24Z

You actually helps identify a bug introduced in 0.12, thank you :)

fixing in #2279

For a workaround - you can create directory ~/.tabby/repositories

metal3d · 2024-05-29T12:51:48Z

Well I created the "repositories" directory. The container is started but no log... I can see that llama-server is started but nothing happen. The :8080 doesn't respond yet.

The directory is currently empty.

 ps fax | grep tabby
  52962 pts/2    Sl+    0:01  |   \_ podman run --device=nvidia.com/gpu=all --security-opt=label=disable --rm -it -p 0.0.0.0:8080:8080 -v /home/metal3d/.tabby:/data:z tabbyml/tabby:0.12.0-rc.1 serve --model StarCoder-1B --no-webserver --device cuda
 141093 pts/3    S+     0:00  |   \_ grep --color=auto tabby
  91207 pts/4    S+     0:00      \_ watch ls -lah /home/metal3d/.tabby/repositories/
  52991 pts/0    Ssl+  18:59  \_ /opt/tabby/bin/tabby serve --model StarCoder-1B --no-webserver --device cuda
 141074 pts/0    S+     0:00      \_ /opt/tabby/bin/llama-server -m /data/models/TabbyML/StarCoder-1B/ggml/model.gguf --cont-batching --port 30888 -np 1 --log-disable --ctx-size 4096 -ngl 9999

(yes I use podman, but trust me: CUDA device is available, the model is well downloaded, I already made plenty of things using it)

I don't know where you want me to put the given yaml content.

Excuse me, TabbyML is a bit complex to configure from external view.

wsxiaoys · 2024-05-29T12:59:15Z

That's what we expect when jumping into the hole of RC :) not much thing is documented atm.

Will update the thread once we finished the release and revamp the docs

metal3d · 2024-05-29T13:01:51Z

Yep, anyway, 29 minutes later, Tabby container is still starting without to respond. I will wait for the your updates.

A pitty because I need it to propose Tabby to the company where I work. I wanted to make a CUDA enabled demo.

metal3d · 2024-05-29T13:05:59Z

Ho ! OK, I see the problem... In the container, llama-server is not compatible to AVX. If I go inside, launching it manually says "Illegal instruction" (like when I didn't compile my own)

So... I need to:

build llamma-server
rebuild the docker image with MY llama-cpp

right ?

wsxiaoys · 2024-05-29T13:53:05Z

Right - you shall be able to:

build llama-server directly from llama.cpp repository, with avx2 disabled.
Start llama-server manually as individual http service.
Connect tabby to the service with config.toml specified above (shall be put in ~/.tabby/config.toml, or /data/config.toml in contiainer.

metal3d · 2024-05-29T15:07:40Z

OK, it seems to be near to work.

Tabby connects the server, and the swagger UI has no error when I try. Now... the problem is that it doesn't give any completion 😄

I will investigate, maybe the prompt_template isn't OK (I already replaced the pipe char that were bad in your comment).

Thanks for all!

wsxiaoys · 2024-05-29T15:09:11Z

Note the prompt template is language specific - please refer https://github.com/TabbyML/registry-tabby/blob/main/models.json for corresponding FIM template

metal3d · 2024-06-03T20:39:50Z

I can confirm:

compiling server from llama_cpp
starting the server with the command ./server -m ${model} --cont-batching --host 0.0.0.0 --port 8000 --ctx-size 4096 -ngl 28 (28 layers in GPU at this time)
configuring the prompt template and api_endpoint in the config file
starting tabbyml/tabby v0.12 rc.3 with --net host (I don't know why, using real IP fails, 127.0.0.1 is OK with host network)
configured the tabby agent file to hit the tabby service

Using 3B model is OK, the completion takes 1s, sometimes less.

I cannot use [[registries]] (I guess it's only for the non OSS version 😢 )

But, it works. I haven't tried all models, StarCoder2-3B seems to fail, while StarCoder-3B is OK.

But, that's a nice approach to be able to start my own LLamaCPP server using the driver I want (OpenCL here, AVX support)

Thanks a lot for all: to create Tabby, for your help and to make us able to make it work on "old" computers 👍🏻

PS: It could be a good idea to provide an image without llama server inside, to make the image light weight for those like me who starts llamacpp server outside

metal3d · 2024-06-05T19:10:28Z

I can close the issue. It's now a problem with the server output.

KweezyCode · 2024-07-08T12:35:29Z

it would be cool if there were 3 compiled builds of llama server: with AVX/AVX2, only AVX and no AVX/2. Not all cpus support these instructions

metal3d added the enhancement New feature or request label May 28, 2024

wsxiaoys added the fixed-in-next-release label May 28, 2024

metal3d closed this as completed Jun 5, 2024

metal3d mentioned this issue Jul 16, 2024

llama-server distributed with tabby requires avx2 cpu instruction #2597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AVX only CPU please #2266

Support AVX only CPU please #2266

metal3d commented May 28, 2024 •

edited

Loading

wsxiaoys commented May 28, 2024

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024 •

edited

Loading

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024

metal3d commented May 29, 2024

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024 •

edited

Loading

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024

metal3d commented Jun 3, 2024

metal3d commented Jun 5, 2024

KweezyCode commented Jul 8, 2024

Support AVX only CPU please #2266

Support AVX only CPU please #2266

Comments

metal3d commented May 28, 2024 • edited Loading

wsxiaoys commented May 28, 2024

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024 • edited Loading

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024

metal3d commented May 29, 2024

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024 • edited Loading

metal3d commented May 29, 2024

wsxiaoys commented May 29, 2024

metal3d commented Jun 3, 2024

metal3d commented Jun 5, 2024

KweezyCode commented Jul 8, 2024

metal3d commented May 28, 2024 •

edited

Loading

wsxiaoys commented May 29, 2024 •

edited

Loading

wsxiaoys commented May 29, 2024 •

edited

Loading