Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Gemma3 chat handler (#1976) #1989

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

kossum
Copy link

@kossum kossum commented Mar 30, 2025

Added gemma3 chat handler, and fixed the image embedding, supports multiple images.

Included llamacpp functions and structures:

  • clip_image_load_from_bytes
  • clip_image_batch_encode
  • clip_image_preprocess
  • clip_image_f32_batch_free
  • clip_image_u8_batch_free
  • clip_image_f32_free
  • clip_image_u8_free
  • clip_image_u8_init
  • struct clip_image_f32_batch
  • struct clip_image_u8_batch

Usage:

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Gemma3ChatHandler

chat_handler = Gemma3ChatHandler(clip_model_path="path/to/mmproj")
llama = Llama(
  model_path="path/to/model",
  chat_handler=chat_handler,
  n_ctx=1024,  # n_ctx should be increased to accommodate the image embedding
)

messages = [
  {
    'role': 'user',
    'content': [
      {'type': 'text', 'text': 'Please describe this image'},
      {'type': 'image', 'url': 'https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/tests/fixtures/tests_samples/COCO/000000039769.png'},
    ]
  }
]

output = llama.create_chat_completion(
  messages,
  stop=['<end_of_turn>', '<eos>'],
  max_tokens=200,
)

print(output['choices'][0]['message']['content'])

Test Results:

Compatibility:

  • Fully backward compatible with existing interfaces.
  • Maintains original APIs while adding new options and interfaces.

@kossum kossum mentioned this pull request Apr 2, 2025
@RuurdBijlsma
Copy link

RuurdBijlsma commented Apr 3, 2025

i've been using it a bit it works nicely, had to find out the message structure but maybe that's normal for different chat handlers. I'm not that familiar with llama-cpp

"type": "image",
"image": {
    "url": "https://image.com/img.jpg",
}

i was used to "image_url" for both places "image_url" is used now.

@dchatel
Copy link

dchatel commented Apr 4, 2025

How would that work with a local image?

@kossum
Copy link
Author

kossum commented Apr 4, 2025

Sorry, i didn't modify the origin chat template of gemma3 and then used "type": "image". Now i have changed the format of the messages to be compatible with the openai api, just like other chat handlers.

Here is a full example:

from pathlib import Path
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Gemma3ChatHandler

def image_to_base64_uri(image: bytes | str):
  import base64
  import urllib.request as request

  if isinstance(image, bytes):
    data = base64.b64encode(image).decode('utf-8')
  else:
    with request.urlopen(image) as f:
      data = base64.b64encode(f.read()).decode('utf-8')
  return f'data:image/png;base64,{data}'

chat_handler = Gemma3ChatHandler(clip_model_path='path/to/mmproj')
llama = Llama(
    model_path='path/to/model',
    chat_handler=chat_handler,
    n_ctx=2048,  # n_ctx should be increased to accommodate the image embedding
)

messages = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'please compare these pictures'},
            {'type': 'image_url', 'image_url': 'https://xxxx/img1.jpg'},
            {'type': 'image_url', 'image_url': {'url': 'https://xxxx/img2.png'}},
            {'type': 'image_url', 'image_url': image_to_base64_uri(Path('path/to/img3.jpg').read_bytes())},
            {'type': 'image_url', 'image_url': {'url': image_to_base64_uri(Path('path/to/img4.png').read_bytes())}},
            {'type': 'text', 'text': 'and then tell me which one looks the best'},
        ]
    }
]

output = llama.create_chat_completion(
    messages,
    stop=['<end_of_turn>', '<eos>'],
    max_tokens=500,
    stream=True,
)

for chunk in output:
  delta = chunk['choices'][0]['delta']
  if 'role' in delta:
    print(delta['role'], end=':\n')
  elif 'content' in delta:
    print(delta['content'], end='')

llama._sampler.close()
llama.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants