-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Modifying schema to support multi modal inputs. #1673
base: main
Are you sure you want to change the base?
Conversation
Based on the interface that OpenAI provides. User can provide a list of [{"type": "", "image_url": ""}] that gets passed to the model.
memgpt/agent_store/db.py
Outdated
@@ -156,6 +156,7 @@ class MessageModel(Base): | |||
# openai info | |||
role = Column(String, nullable=False) | |||
text = Column(String) # optional: can be null if function call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should probably delete
text = Column(String) # optional: can be null if function call
and replace with
content = Column(JSON) # optional: multi-modal input
which in the pydantic model is Optional[Union[str, List[MultiModalMessagePart]]]
but in the database itself is stored as an optional JSON field
@@ -192,6 +193,7 @@ def to_record(self): | |||
role=self.role, | |||
name=self.name, | |||
text=self.text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, should delete/deprecate text
and replace with content
which can be None
, str
(text-only), or List[dict/MultiModalMessagePart]
(multi-modal).
memgpt/schemas/message.py
Outdated
@@ -62,6 +62,8 @@ class Message(BaseMessage): | |||
id: str = BaseMessage.generate_id_field() | |||
role: MessageRole = Field(..., description="The role of the participant.") | |||
text: str = Field(..., description="The text of the message.") | |||
# Field mm_content is only used when role is 'user'. It needs to be mapped to MultiModalMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here - deprecating text
and replacing with a new content
that matches OpenAI
@@ -223,8 +225,9 @@ def to_openai_dict( | |||
|
|||
elif self.role == "user": | |||
assert all([v is not None for v in [self.text, self.role]]), vars(self) | |||
content = self.mm_content if self.mm_content is not None else self.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we do the above (replace text
) we should no longer need this
@@ -8,13 +8,15 @@ class SystemMessage(BaseModel): | |||
role: str = "system" | |||
name: Optional[str] = None | |||
|
|||
class MultiModalMessage(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the text field. Adding content instead of mm_content.
feat: support for multi-modal input - work in progress - schema change
Based on the interface that OpenAI provides.
User can provide a list of
[{"type": "", "image_url": ""}]
that gets passed to the model.
Please describe the purpose of this pull request.
Modifying schema for adding Multimodal support - User can give text and image.
How to test
Not testable. Only for reviewing high level schema changes. Won't be merged without more commits.
Have you tested this PR?
No.