Open
Description
Is your feature request related to a problem? Please describe.
Llama3.2 was released, and as it has multimodal support would be great to have it in LocalAI
Describe the solution you'd like
Describe alternatives you've considered
Additional context
llama.cpp have several issues wrt multimodal capabilities:
- Llama-3.2 11B Vision Support ggml-org/llama.cpp#9643
- server: Bring back multimodal support ggml-org/llama.cpp#8010
vLLM has already added support for it in vllm-project/vllm#8811
See also:
- llama : first attempt to implement vision API (WIP) ggml-org/llama.cpp#9687
- Add the new Multi-Modal model of mistral AI: mistral-small-3.1-24b & pixtral-12b #3535
- Feature Request: LLaMA 3.2 Vision Support ollama/ollama#6972
- llm: add mllama (Llama 3.2 Vision) language model support ollama/ollama#6965
- draft: mllama vision encoder ollama/ollama#6971
- llama3.2 vision support ollama/ollama#6963