High-level API for multimodality

**Is your feature request related to a problem? Please describe.**

Current high-level implementation of multimodality is relying on a specific prompt format.

**Describe the solution you'd like**

Models like [Obsidian](https://huggingface.co/nisten/obsidian-3b-multimodal-q6-gguf/tree/main) work with llama.cpp server and have a different format. It would be nice to have a high-level API for multimodality in llama-cpp-python to be able to pass `image`/`images` as an argument after initializing `Llama()` with all the paths to required extra-models, without relying on a pre-defined prompt format such as `Llava15ChatHandler`.

**Describe alternatives you've considered**
Alternatively, a custom prompt format class that supports images can be implemented, where prompt string is passed as an argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High-level API for multimodality #928

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

High-level API for multimodality #928

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions