Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal Llama3 Support #1403

Open
xx025 opened this issue Apr 28, 2024 · 1 comment
Open

Multimodal Llama3 Support #1403

xx025 opened this issue Apr 28, 2024 · 1 comment

Comments

@xx025
Copy link

xx025 commented Apr 28, 2024

I came across a model on Huggingface that supports Llama3 multimodal Bunny-Llama-3-8B-V: bunny-llama, and I'd like to be able to deploy it using llama-cpp-python!

But I found that the existing chat_format:llama-3 doesn't seem to support running it.

I converted it to gguf format via llama.cpp and ran it with the following configuration

python llama.cpp/convert.py \
Bunny-Llama-3-8B-V --outtype f16 \
--outfile converted.bin \
--vocab-type bpe
{
    "host": "0.0.0.0",
    "port": 8080,
    "api_key":"xx",
    "models": [
        {
            "model": "bunny-llama.gguf",
            "model_alias": "bunny-llama",
            "chat_format": "llama-3",
            "n_gpu_layers": -1,
            "offload_kqv": true,
            "n_threads": 12,
            "n_batch": 512,
            "n_ctx": 2048
        }
    ]    
}
python3 -m llama_cpp.server \
--config_file bunny-llama.json
@abetlen
Copy link
Owner

abetlen commented Apr 28, 2024

Check out #1147 it should be merged soon. The only caveat here is that you'll need use the llava example in llama.cpp to extract the image encoder as well when you quantize the models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants