[DRAFT] feat: add multimodal support for ChatMessage #145

LastRemote · 2024-12-04T07:36:23Z

Related Issues

fixes Add multimodal support for the new ChatMessage class #135

Proposed Changes:

Added multimodal support according to deepset-ai/haystack#7848 (comment)
Adjusted openai/anthropic utils to allow converting media contents to their API format (I personally use httpx to send and parse the response. Please let me know if OpenAI/Anthropic SDK expects a different format).

How did you test it?

Added unit tests, and E2E tests with customized httpx-based AzureOpenAI/BedrockAnthropic generators.

Notes for the reviewer

I also added _name field back since it is useful in some multi-agent setups. Also finally I have a default value for it so no more headaches when serializing/deserializing this.
May I ask what is the intention behind keeping the underscore prefix when serializing and deserializing ChatMessage? Personally I find it very annoying.
There is a slight change of behavior in edge cases where a message may have an empty string as the text, or contain multiple text segments. I am trying to make the most sense out of it, but please give a comment if you would prefer the other way.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

vblagoje · 2024-12-05T09:40:58Z

@LastRemote first of all thanks for this effort - kudos. I see how you figured out the same directionI would chose for this endeavour. We need a new dataclass MediaContent to represent images and other modalities like audio in ChatMessageContentT!

Let us get back to you when @anakin87 comes from PTO so we can coordinate this effort. I think we need to break down this PR into several PRs and integration steps:

Make changes to ByteStream to enable base64 (de)encoding
Introduce MediaContent as a new dataclass that gets its content(de)encoded via new ByteStream
Introduce MediaContent support in several ChatGenerators as separate PRs
Make sure that the whole new design makes total sense
Review each ChatGenerators PRs support via individual PR

Thoughts @mpangrazzi ?

LastRemote · 2024-12-05T09:57:36Z

@LastRemote first of all thanks for this effort - kudos. I see how you figured out the same directionI would chose for this endeavour. We need a new dataclass MediaContent to represent images and other modalities like audio in ChatMessageContentT!

Let us get back to you when @anakin87 comes from PTO so we can coordinate this effort. I think we need to break down this PR into several PRs and integration steps:

Make changes to ByteStream to enable base64 (de)encoding

Introduce MediaContent as a new dataclass that gets its content(de)encoded via new ByteStream

Introduce MediaContent support in several ChatGenerators as separate PRs

Make sure that the whole new design makes total sense

Review each ChatGenerators PRs support via individual PR

Thoughts @mpangrazzi ?

Thanks, glad that it aligns with what your had in mind. Let's hear back from them first and we can break this down into smaller PRs afterwards.

mpangrazzi · 2024-12-05T14:20:31Z

@LastRemote Hi! Discussed with @vblagoje and I agree with him, PR is surely valid but a bit too convoluted. Breaking it down as he suggested would probably be the best for us!

LastRemote · 2024-12-13T07:51:21Z

@vblagoje @mpangrazzi Thanks for the feedback, I will try to break this down into smaller PRs. Shall we close this one and use #135 as the megathread to check all the PRs?

LastRemote · 2024-12-13T07:51:39Z

Btw here's the first one: #157

LastRemote requested a review from a team as a code owner December 4, 2024 07:36

LastRemote requested review from mpangrazzi and removed request for a team December 4, 2024 07:36

LastRemote mentioned this pull request Dec 4, 2024

Add multimodal support for the new ChatMessage class #135

Open

feat: add multimodal support for ChatMessage

6031934

LastRemote force-pushed the dev/multimodal branch from 3073aae to 6031934 Compare December 4, 2024 07:45

LastRemote changed the title ~~feat: add multimodal support for ChatMessage~~ [DRAFT] feat: add multimodal support for ChatMessage Dec 5, 2024

vblagoje mentioned this pull request Dec 9, 2024

Amazon Bedrock Attachments deepset-ai/haystack-core-integrations#1229

Open

LastRemote mentioned this pull request Dec 13, 2024

feat: ByteStream auto mime_type detection and base64 (de)encoding #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] feat: add multimodal support for ChatMessage #145

[DRAFT] feat: add multimodal support for ChatMessage #145

LastRemote commented Dec 4, 2024

vblagoje commented Dec 5, 2024

LastRemote commented Dec 5, 2024 •

edited

Loading

mpangrazzi commented Dec 5, 2024

LastRemote commented Dec 13, 2024

LastRemote commented Dec 13, 2024

[DRAFT] feat: add multimodal support for ChatMessage #145

Are you sure you want to change the base?

[DRAFT] feat: add multimodal support for ChatMessage #145

Conversation

LastRemote commented Dec 4, 2024

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

vblagoje commented Dec 5, 2024

LastRemote commented Dec 5, 2024 • edited Loading

mpangrazzi commented Dec 5, 2024

LastRemote commented Dec 13, 2024

LastRemote commented Dec 13, 2024

LastRemote commented Dec 5, 2024 •

edited

Loading