-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use api to call a multi-model with local image? #89
Comments
I had a similar problem too. I wrote codes like this:
I expected the Claude model to read the attached image, but it obviously did not, and returned the following information: I wonder if it is possible to invoke a multi-modal via API, thanks. |
Yes I got the same response when using claude model. |
Did you solved this problem?I tried add parsed_content field but useless. |
+1 |
2 similar comments
+1 |
+1 |
Sorry for the delay on the response. The API is designed such that only attachments uploaded through the UI (Poe Client) are sent to the LLM. Files are processed and linked to the message upon the user uploading them from the client, so attaching arbitrary files from the bot server does not work. To recap, with the current API, we only support:
I can already see how this could be a limitation for bot creators, but I am still curious what use cases you all are working on that could benefit from attaching files to the user message via the API? |
@JohntheLi A typical case is parsing pdf. When user uploads a long pdf, we certainly need to preprocess it into image pages, text pages, running all sorts of different tasks for different parts the intermediate layer extracts on its own. Directly sending the full document to a bot is pointless. |
Agree with your example that it will be useful to have. I will bring this up with the team. Keep in mind that there are some complexities with this - this would essentially be linking new attachments to the user message and we need to see how that might break existing product expectations. So I don't think its a small task, but it'll be on our radar. Thanks for reporting it! |
Many thanks. Cannot be more excited to work on some new M-LLM applications. |
I’d love to see this feature too! |
Hi, I'm using the poe api to call a multimodal model, like gpt-4v or claude3-opus. I refer to an example in the diagram, but I can't find the code on how to load the local image into the request. May I know how can I implement this? I noticed that the new documentation mentions "attachment.parsed_content", should I use this? What is the format of parsed_content? Should I process the image to base64 or use binary read?
Looking for your reply
The text was updated successfully, but these errors were encountered: