-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial checkin for vision support with tests #2240
base: main
Are you sure you want to change the base?
Conversation
letta/client/client.py
Outdated
@@ -152,10 +152,11 @@ def send_message( | |||
stream: Optional[bool] = False, | |||
stream_steps: bool = False, | |||
stream_tokens: bool = False, | |||
image: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ideal way to do this is we actually make message
(currently str
) be type MultiMediaContent
, which itself is Union[str, List[MultiMediaContentPart]]
Then we will be able to support multiple images and future modalities like audio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated. see message in your other comment
letta/schemas/message.py
Outdated
@@ -421,8 +423,22 @@ def to_openai_dict( | |||
|
|||
elif self.role == "user": | |||
assert all([v is not None for v in [self.text, self.role]]), vars(self) | |||
|
|||
if self.image is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with this and made the initial typing changes/enough to pass tests. However left off before two major changes:
- Renaming Message.text -> Message.content which is used in a lot of places
- Updating the ORM Message object which is typed str and requires migration. Tricky because images probably shouldn't be stored raw in the DB
Please describe the purpose of this pull request.
Add vision support when using OpenAI models
Remaining work before merge:
Extensions:
How to test
How can we test your PR during review? What commands should we run? What outcomes should we expect?
Unit tests included
Have you tested this PR?
Have you tested the latest commit on the PR? If so please provide outputs from your tests.
Unit tests pass
Related issues or PRs
Please link any related GitHub issues or PRs.
Is your PR over 500 lines of code?
If so, please break up your PR into multiple smaller PRs so that we can review them quickly, or provide justification for its length.
Additional context
Add any other context or screenshots about the PR here.