Multimodal Llama3 Support #1403

xx025 · 2024-04-28T13:21:34Z

I came across a model on Huggingface that supports Llama3 multimodal Bunny-Llama-3-8B-V: bunny-llama, and I'd like to be able to deploy it using llama-cpp-python!

But I found that the existing chat_format:llama-3 doesn't seem to support running it.

I converted it to gguf format via llama.cpp and ran it with the following configuration

python llama.cpp/convert.py \
Bunny-Llama-3-8B-V --outtype f16 \
--outfile converted.bin \
--vocab-type bpe

{
    "host": "0.0.0.0",
    "port": 8080,
    "api_key":"xx",
    "models": [
        {
            "model": "bunny-llama.gguf",
            "model_alias": "bunny-llama",
            "chat_format": "llama-3",
            "n_gpu_layers": -1,
            "offload_kqv": true,
            "n_threads": 12,
            "n_batch": 512,
            "n_ctx": 2048
        }
    ]    
}

python3 -m llama_cpp.server \
--config_file bunny-llama.json

abetlen · 2024-04-28T16:44:48Z

Check out #1147 it should be merged soon. The only caveat here is that you'll need use the llava example in llama.cpp to extract the image encoder as well when you quantize the models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal Llama3 Support #1403

Multimodal Llama3 Support #1403

xx025 commented Apr 28, 2024

abetlen commented Apr 28, 2024

Multimodal Llama3 Support #1403

Multimodal Llama3 Support #1403

Comments

xx025 commented Apr 28, 2024

abetlen commented Apr 28, 2024