Add support for arbitary image keys for multi modal queries #74
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently the only way to add images is to put them in the "images" key of the InputSchema to send them to the model.
Further in the current implementation we can only have 1 text description for the image, but for my use case the image is just a small part of a larger analysis, so I think it would be better to be able to add the full payload like in the normal text queries instead of silently swallowing the rest.
Not sure if I misunderstand something w.r.t multi-modality, but I don't see any other way for my use case.
Caveats
I had to edit one test for all the tests to pass, because before just the image description was sent to the model not the full dumped json like in the "text only" queries.
TODO:
Found the dev-guide, will fix the issues