-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Closed
Closed
Copy link
Labels
feature requestNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
🚀 The feature, motivation and pitch
Currently, we expect image_url, audio_url etc. to be inside the messages that are passed to the chat template. We would like to expand this to supporting image, audio etc. inputs, just like in HuggingFace Transformers:
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Can you describe this image?"}
]
},
]To avoid having to pass multi-modal inputs separately, we propose the following extension:
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Can you describe this image?"}
]
},
]This lets us pass multi-modal data such as PIL images to LLM.chat directly without having to encode them into base64 URLs.
Alternatives
No response
Additional context
cc @ywang96 @Isotr0py @hmellor
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
imkero, alex-jw-brooks, Isotr0py, shuttie and Sasha-BabyBird
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Type
Projects
Status
Done