initial checkin for vision support with tests #2240

montiblanc97 · 2024-12-12T19:55:46Z

Please describe the purpose of this pull request.
Add vision support when using OpenAI models

both image URL and base64-encoded images
tests

Remaining work before merge:

Renaming Message.text -> Message.content which is used in a lot of places
Updating the ORM Message object which is typed str and requires migration.
- Current behavior is just ignoring the images and storing the text
CLI support (ollama-like). Parse any local file paths in CLI text input and treat them as image uploads
Solution to saving base64 images. They probably should not go in the Postgres DB
- We can store uploaded images to cloud blob storage (S3, GCP) then replace images in the messages with S3 urls
- For local servers: min.io, open-source S3 alternative with same API
  - Deployed via docker-compose. Can wrap under letta_server like PG or separate service
  - API is same, so the only differences between cloud/local server will be some config values if using AWS
- Sarah also suggested an alternative for local dev which is saving local file paths. Much simpler

Extensions:

Model switching to non-multimodal models, when the history contains images. Ideas:
- Captioning model fallback
- Insertion of "unavailable image" or something along those lines

How to test
How can we test your PR during review? What commands should we run? What outcomes should we expect?

Unit tests included

Have you tested this PR?
Have you tested the latest commit on the PR? If so please provide outputs from your tests.

Unit tests pass

Related issues or PRs
Please link any related GitHub issues or PRs.

Is your PR over 500 lines of code?
If so, please break up your PR into multiple smaller PRs so that we can review them quickly, or provide justification for its length.

Additional context
Add any other context or screenshots about the PR here.

cpacker · 2024-12-12T21:50:36Z

letta/client/client.py

@@ -152,10 +152,11 @@ def send_message(
        stream: Optional[bool] = False,
        stream_steps: bool = False,
        stream_tokens: bool = False,
+        image: Optional[str] = None,


I think the ideal way to do this is we actually make message (currently str) be type MultiMediaContent, which itself is Union[str, List[MultiMediaContentPart]]

Then we will be able to support multiple images and future modalities like audio.

sounds good, updated. see message in your other comment

cpacker · 2024-12-12T21:52:00Z

letta/schemas/message.py

@@ -421,8 +423,22 @@ def to_openai_dict(

        elif self.role == "user":
            assert all([v is not None for v in [self.text, self.role]]), vars(self)
+
+            if self.image is not None:


So inside of Message, .text becomes .content, which basically is a 1:1 mapping to OpenAI's format:

https://platform.openai.com/docs/api-reference/chat/create

agree with this and made the initial typing changes/enough to pass tests. However left off before two major changes:

Renaming Message.text -> Message.content which is used in a lot of places

Updating the ORM Message object which is typed str and requires migration. Tricky because images probably shouldn't be stored raw in the DB

initial checkin for vision support with tests

77a04f9

cpacker reviewed Dec 12, 2024

View reviewed changes

andrewwongscale added 2 commits December 12, 2024 16:16

WIP: move from text -> content in Message objects

1a3b276

test with 2 images

d21b3dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial checkin for vision support with tests #2240

initial checkin for vision support with tests #2240

montiblanc97 commented Dec 12, 2024 •

edited

Loading

cpacker Dec 12, 2024

montiblanc97 Dec 13, 2024

cpacker Dec 12, 2024

montiblanc97 Dec 13, 2024

initial checkin for vision support with tests #2240

Are you sure you want to change the base?

initial checkin for vision support with tests #2240

Conversation

montiblanc97 commented Dec 12, 2024 • edited Loading

cpacker Dec 12, 2024

Choose a reason for hiding this comment

montiblanc97 Dec 13, 2024

Choose a reason for hiding this comment

cpacker Dec 12, 2024

Choose a reason for hiding this comment

montiblanc97 Dec 13, 2024

Choose a reason for hiding this comment

montiblanc97 commented Dec 12, 2024 •

edited

Loading