Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use api to call a multi-model with local image? #89

Open
HarryZhou-618 opened this issue Apr 12, 2024 · 12 comments
Open

How to use api to call a multi-model with local image? #89

HarryZhou-618 opened this issue Apr 12, 2024 · 12 comments

Comments

@HarryZhou-618
Copy link

Hi, I'm using the poe api to call a multimodal model, like gpt-4v or claude3-opus. I refer to an example in the diagram, but I can't find the code on how to load the local image into the request. May I know how can I implement this? I noticed that the new documentation mentions "attachment.parsed_content", should I use this? What is the format of parsed_content? Should I process the image to base64 or use binary read?
Looking for your reply
Snipaste_2024-04-12_18-18-12

@Arbow
Copy link

Arbow commented Apr 13, 2024

I had a similar problem too. I wrote codes like this:

import asyncio
from typing import AsyncIterable
import fastapi_poe as fp
from sse_starlette.sse import ServerSentEvent
from fastapi_poe.types import ContentType, ProtocolMessage, Attachment, PartialResponse

api_key = 'KEY'
prompt = """ Describe the attachment image in detail."""
attachment = Attachment(url="https://pfst.cf2.poecdn.net/base/image/xxxxxxxxxxxxxxxxxxxxxxx?w=1024&h=1024", \
                        content_type="image/png", name="image.png") 
message = fp.ProtocolMessage(role="user", content=prompt, attachments=[attachment])

async def get_bot_response(messages: list[ProtocolMessage], bot_name: str, api_key: str) -> AsyncIterable[PartialResponse | ServerSentEvent]:
    chuncks = []
    async for partial in fp.get_bot_response(messages=[message], bot_name=bot_name, api_key=api_key): 
        chuncks.append(partial.text)
    print(''.join(chuncks))

asyncio.run(get_bot_response([message], 'Claude-3-Sonnet', api_key))

I expected the Claude model to read the attached image, but it obviously did not, and returned the following information:
"Unfortunately, you have not actually attached or uploaded any images to our conversation yet. If you do upload an image, I will be happy to describe it in detail for you. Please let me know once you have attached an image."

I wonder if it is possible to invoke a multi-modal via API, thanks.

@HarryZhou-618
Copy link
Author

I had a similar problem too. I wrote codes like this:

import asyncio
from typing import AsyncIterable
import fastapi_poe as fp
from sse_starlette.sse import ServerSentEvent
from fastapi_poe.types import ContentType, ProtocolMessage, Attachment, PartialResponse

api_key = 'KEY'
prompt = """ Describe the attachment image in detail."""
attachment = Attachment(url="https://pfst.cf2.poecdn.net/base/image/xxxxxxxxxxxxxxxxxxxxxxx?w=1024&h=1024", \
                        content_type="image/png", name="image.png") 
message = fp.ProtocolMessage(role="user", content=prompt, attachments=[attachment])

async def get_bot_response(messages: list[ProtocolMessage], bot_name: str, api_key: str) -> AsyncIterable[PartialResponse | ServerSentEvent]:
    chuncks = []
    async for partial in fp.get_bot_response(messages=[message], bot_name=bot_name, api_key=api_key): 
        chuncks.append(partial.text)
    print(''.join(chuncks))

asyncio.run(get_bot_response([message], 'Claude-3-Sonnet', api_key))

I expected the Claude model to read the attached image, but it obviously did not, and returned the following information: "Unfortunately, you have not actually attached or uploaded any images to our conversation yet. If you do upload an image, I will be happy to describe it in detail for you. Please let me know once you have attached an image."

I wonder if it is possible to invoke a multi-modal via API, thanks.

Yes I got the same response when using claude model.
While I was checking the latest documentation and api code, I found out that poe has added a new parsed_content field for attachment, I wonder if this would be a way to do it, maybe we can handle the image as a parsed_content style, I'm trying it out, and you can try it too!

@Arbow
Copy link

Arbow commented Apr 17, 2024

I had a similar problem too. I wrote codes like this:

import asyncio
from typing import AsyncIterable
import fastapi_poe as fp
from sse_starlette.sse import ServerSentEvent
from fastapi_poe.types import ContentType, ProtocolMessage, Attachment, PartialResponse

api_key = 'KEY'
prompt = """ Describe the attachment image in detail."""
attachment = Attachment(url="https://pfst.cf2.poecdn.net/base/image/xxxxxxxxxxxxxxxxxxxxxxx?w=1024&h=1024", \
                        content_type="image/png", name="image.png") 
message = fp.ProtocolMessage(role="user", content=prompt, attachments=[attachment])

async def get_bot_response(messages: list[ProtocolMessage], bot_name: str, api_key: str) -> AsyncIterable[PartialResponse | ServerSentEvent]:
    chuncks = []
    async for partial in fp.get_bot_response(messages=[message], bot_name=bot_name, api_key=api_key): 
        chuncks.append(partial.text)
    print(''.join(chuncks))

asyncio.run(get_bot_response([message], 'Claude-3-Sonnet', api_key))

I expected the Claude model to read the attached image, but it obviously did not, and returned the following information: "Unfortunately, you have not actually attached or uploaded any images to our conversation yet. If you do upload an image, I will be happy to describe it in detail for you. Please let me know once you have attached an image."
I wonder if it is possible to invoke a multi-modal via API, thanks.

Yes I got the same response when using claude model. While I was checking the latest documentation and api code, I found out that poe has added a new parsed_content field for attachment, I wonder if this would be a way to do it, maybe we can handle the image as a parsed_content style, I'm trying it out, and you can try it too!

Did you solved this problem?I tried add parsed_content field but useless.

@17Reset
Copy link

17Reset commented May 20, 2024

+1

2 similar comments
@Michalai0
Copy link

+1

@qingyanbaby
Copy link

+1

@qingyanbaby
Copy link

@JohntheLi

@JohntheLi
Copy link
Collaborator

Sorry for the delay on the response. The API is designed such that only attachments uploaded through the UI (Poe Client) are sent to the LLM. Files are processed and linked to the message upon the user uploading them from the client, so attaching arbitrary files from the bot server does not work.

To recap, with the current API, we only support:

  1. attachments in user message (request) that are attached via the Poe client
  2. attachments in bot message (response) that are attached via post_message_attachment as seen here: https://creator.poe.com/docs/server-bots-functional-guides#sending-files-with-your-response

I can already see how this could be a limitation for bot creators, but I am still curious what use cases you all are working on that could benefit from attaching files to the user message via the API?

@ZihaoZhou
Copy link

ZihaoZhou commented Sep 12, 2024

I can already see how this could be a limitation for bot creators, but I am still curious what use cases you all are working on that could benefit from attaching files to the user message via the API?

@JohntheLi A typical case is parsing pdf. When user uploads a long pdf, we certainly need to preprocess it into image pages, text pages, running all sorts of different tasks for different parts the intermediate layer extracts on its own. Directly sending the full document to a bot is pointless.

@JohntheLi
Copy link
Collaborator

Agree with your example that it will be useful to have. I will bring this up with the team.

Keep in mind that there are some complexities with this - this would essentially be linking new attachments to the user message and we need to see how that might break existing product expectations. So I don't think its a small task, but it'll be on our radar. Thanks for reporting it!

@ZihaoZhou
Copy link

Many thanks. Cannot be more excited to work on some new M-LLM applications.

@alfred-liu96
Copy link

I’d love to see this feature too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants