Generic Chat Formats for Multimodal Models (Obsidian, LLaVA1.6, Moondream) #1147

abetlen · 2024-01-31T19:14:02Z

The Llava1.5 chat fomat is hard coded and makes it really hard to extend for new VLMs

Goals:

Easily configurable chat templates so we can quickly add support for models with same projector but different chat formats
Ability to use external library for projections (transformers, pytorch, etc) and just load the token embeddings into a llama.cpp model

Usage

Python

>>> from llama_cpp import Llama
>>> from llama_cpp.llama_chat_format import MoondreamChatHandler
>>> chat_handler = MoondreamChatHandler.from_pretrained(
  repo_id="vikhyatk/moondream2",
  filename="*mmproj*",
)
>>> llm = Llama.from_pretrained(
  repo_id="vikhyatk/moondream2"
  filename="*text-model*",
  chat_handler=chat_handler,
  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
)
>>> llm.create_chat_completion(
    messages = [
        {
            "role": "user",
            "content": [
                {"type" : "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }

            ]
        }
    ]
)

Server Config

host: "0.0.0.0"
models:
  - model: "*text-model*"
    clip_model_path: "*mmproj*"
    hf_model_repo_id: vikhyatk/moondream2
    model_alias: "gpt-4-turbo"
    chat_format: moondream
    n_threads_batch: -1
    n_gpu_layers: -1
    verbose: true

Overview

The approach / solution I'm taking here is to map directly from OpenAI ChatCompletionRequestMessage lists to a chat format using jinja2, for images the approach is render the image url into the prompt and then split on those image urls. The chat completion handler than decode's the text and image segments of the prompt and finally starts generation.

Example Llava 1.5 Jinja Chat Format

    CHAT_FORMAT = (
        "{% for message in messages %}"
        "{% if message.role == 'system' %}"
        "{{ message.content }}"
        "{% endif %}"
        "{% if message.role == 'user' %}"
        "{% if message.content is string %}"
        "\nUSER: {{ message.content }}"
        "{% elif message.content is iterable %}"
        "\nUSER: "
        "{% for content in message.content %}"
        "{% if content.type == 'text' %}"
        "{{ content.text }}"
        "{% endif %}"
        "{% if content.type == 'image_url' and content.image_url is string %}"
        "{{ content.image_url }}"
        "{% endif %}"
        "{% if content.type == 'image_url' and content.image_url is mapping %}"
        "{{ content.image_url.url }}"
        "{% endif %}"
        "{% endfor %}"
        "{% endif %}"
        "{% endif %}"
        "{% if message.role == 'assistant' and message.content is not none %}"
        "\nASSISTANT: {{ message.content }}"
        "{% endif %}"
        "{% endfor %}"
        "{% if add_generation_prompt %}"
        "\nASSISTANT: "
        "{% endif %}"
    )

Progress:

Closes #1301
Closes #1204

…t for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (abetlen#1147) * Test dummy image tags in chat templates * Format and improve types for llava_cpp.py * Add from_pretrained support to llava chat format. * Refactor llava chat format to use a jinja2 * Revert chat format test * Add moondream support (wip) * Update moondream chat format * Update moondream chat format * Update moondream prompt * Add function calling support * Cache last image embed * Add Llava1.6 support * Add nanollava support * Add obisidian support * Remove unnecessary import * Re-order multimodal chat formats * Logits all no longer required for multi-modal models * Update README.md * Update docs * Update README * Fix typo * Update README * Fix typo

Test dummy image tags in chat templates

7a9f639

abetlen mentioned this pull request Apr 4, 2024

Missing LLava 1.6 support for handling custom templates with the respect of the chosen LLM. #1301

Closed

abetlen added 5 commits April 27, 2024 12:56

Merge branch 'main' into generic-vlm-chat-format

b7338a0

Format and improve types for llava_cpp.py

b78ed72

Add from_pretrained support to llava chat format.

a3c3b5d

Refactor llava chat format to use a jinja2

d7b28f7

Revert chat format test

3cef09c

abetlen marked this pull request as ready for review April 27, 2024 17:00

abetlen added 10 commits April 27, 2024 13:19

Add moondream support (wip)

2fd41f9

Update moondream chat format

7df9483

Update moondream chat format

1705893

Update moondream prompt

fd55c29

Add function calling support

94fe4bc

Cache last image embed

0e182be

Add Llava1.6 support

20e0967

Add nanollava support

8324ee0

Add obisidian support

8f09d42

Merge branch 'main' into generic-vlm-chat-format

22c55cd

This was referenced Apr 28, 2024

High-level API for multimodality #928

Closed

Multimodal Llama3 Support #1403

Open

abetlen added 6 commits April 29, 2024 23:57

Merge branch 'main' into generic-vlm-chat-format

c89c6de

Remove unnecessary import

dd47dda

Re-order multimodal chat formats

0b891f4

Logits all no longer required for multi-modal models

0e15835

Update README.md

fc5d01c

Update docs

f03326c

abetlen changed the title ~~Generic Chat Formats for Multimodal Models (WIP)~~ Generic Chat Formats for Multimodal Models (Obsidian, LLaVA1.6, Moondream) Apr 30, 2024

abetlen added 3 commits April 30, 2024 01:25

Update README

efd99f1

Fix typo

6e4ad72

Update README

f70326f

Fix typo

64008aa

abetlen merged commit fe2da09 into main Apr 30, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic Chat Formats for Multimodal Models (Obsidian, LLaVA1.6, Moondream) #1147

Generic Chat Formats for Multimodal Models (Obsidian, LLaVA1.6, Moondream) #1147

abetlen commented Jan 31, 2024 •

edited

Loading

Generic Chat Formats for Multimodal Models (Obsidian, LLaVA1.6, Moondream) #1147

Generic Chat Formats for Multimodal Models (Obsidian, LLaVA1.6, Moondream) #1147

Conversation

abetlen commented Jan 31, 2024 • edited Loading

Usage

Python

Server Config

Overview

Example Llava 1.5 Jinja Chat Format

abetlen commented Jan 31, 2024 •

edited

Loading