Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-level API for multimodality #928

Closed
remixer-dec opened this issue Nov 19, 2023 · 8 comments
Closed

High-level API for multimodality #928

remixer-dec opened this issue Nov 19, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@remixer-dec
Copy link

Is your feature request related to a problem? Please describe.

Current high-level implementation of multimodality is relying on a specific prompt format.

Describe the solution you'd like

Models like Obsidian work with llama.cpp server and have a different format. It would be nice to have a high-level API for multimodality in llama-cpp-python to be able to pass image/images as an argument after initializing Llama() with all the paths to required extra-models, without relying on a pre-defined prompt format such as Llava15ChatHandler.

Describe alternatives you've considered
Alternatively, a custom prompt format class that supports images can be implemented, where prompt string is passed as an argument.

@abetlen abetlen added the enhancement New feature or request label Nov 21, 2023
@abetlen
Copy link
Owner

abetlen commented Nov 21, 2023

I'll consider adding the multi-modal args to the Llama class but I'm worried about growing the API surface too much too quickly.

For now, I can definitely implement an Obsidian chat handler and maybe abstract out some of the Llava stuff to make it easier to make this generic in the future.

Also thank you for bringing that model to my attention, definitely have to give it a try now!

@JoshuaFurman
Copy link

This seems like an apt place to ask this; How can I supply a local image file to use with multimodal images? Looks like the example assumes an image is being hosted at a url somewhere?

@abetlen
Copy link
Owner

abetlen commented Nov 23, 2023

@JoshuaFurman good question, I'll update the docs to include this. It works the same as the OpenAI gpt-4-vision-preview you can pass the image as a base64 encoded data url:

import base64

def image_to_base64_data_uri(file_path):
    with open(file_path, "rb") as img_file:
        base64_data = base64.b64encode(img_file.read()).decode('utf-8')
        return f"data:image/png;base64,{base64_data}"

# Replace 'file_path.png' with the actual path to your PNG file
file_path = 'file_path.png'
data_uri = image_to_base64_data_uri(file_path)

Then just pass that in place of the http url

@JoshuaFurman
Copy link

Oh fantastic thank you! And this can be done without running the server right? I'm looking to add this directly into an application

@abetlen
Copy link
Owner

abetlen commented Nov 23, 2023

Yup, you just need to pass the Llava15ChatHandler to Llama like in the README example so that the class knows how to format the chat with images.

@JoshuaFurman
Copy link

Great thanks so much! Happy thanksgiving :)

@XcerontangX
Copy link

@JoshuaFurman good question, I'll update the docs to include this. It works the same as the OpenAI gpt-4-vision-preview you can pass the image as a base64 encoded data url:

import base64

def image_to_base64_data_uri(file_path):
    with open(file_path, "rb") as img_file:
        base64_data = base64.b64encode(img_file.read()).decode('utf-8')
        return f"data:image/png;base64,{base64_data}"

# Replace 'file_path.png' with the actual path to your PNG file
file_path = 'file_path.png'
data_uri = image_to_base64_data_uri(file_path)

Then just pass that in place of the http url

Thanks this works

@abetlen abetlen closed this as completed Feb 26, 2024
@abetlen
Copy link
Owner

abetlen commented Apr 28, 2024

Closed this by mistake, though will be solved in #1147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants