MultiModal does not work with Next.js Frontend and FastAPI backend #65

Laktus · 2024-04-25T15:23:47Z

Hi,

I wanted to implement custom evaluating logic. Realizing that only the python implemention of LLamaIndex supports QuestionGenerator i thought that it would be more reasonable to the FastAPI backend + Next.js Frontend setup.

I managed to pass the data for images to the backend extending the handleSubmit of useChat for vercel/ai#725. I however don't know how to duplicate the functionality of StreamData in the FastAPI backend.

Can you make this example work out of the box or provide some further documentation of how to implement this? Currently the multi modality does not work, without multiple changes.

Thanks for taking your time and reading my request.

The text was updated successfully, but these errors were encountered:

marcusschiesser · 2024-04-26T05:14:58Z

@Laktus by coincidence, I just added a vercel/ai compatible StreamingResponse in the last release, see https://github.com/run-llama/create-llama/blob/main/templates/types/streaming/fastapi/app/api/routers/vercel_response.py

Laktus · 2024-04-26T09:32:22Z

@marcusschiesser That looks awesome thanks! I think someone still needs to modify the template to work with it though. Or is it already integrated in the last release? The template is missing being able to handle the incoming data from useChat's handleSubmit as well as passing it back from server to frontend using the StreamingTextResponse (or your vercel_response in FastAPI).
Who would be responsible for integrating the changes of the latest FastAPI into the starter-template?

marcusschiesser · 2024-05-03T07:14:35Z

@Laktus The template is part of create-llama since npx create-llama@0.1.0 - It was just updated in npx create-llama@0.1.1 - what are you missing?

Laktus · 2024-05-05T12:22:46Z

@marcusschiesser I will try integrating my changes to the backend for handling the data parameter and then will add a message if it works, thanks for the update!

Laktus · 2024-05-09T21:35:22Z

@marcusschiesser
Hi Marcus, i don't see any inherent integration with images in the FastAPI backend. Do you know how i can add this? How do i pass image information into the ChatMessage object when im using a image capable model like GPT-4 or GPT-4-Vision? (I already managed to pass the image from front-to-backend and back to the frontend for the display)

Thanks for any help.

marcusschiesser · 2024-05-10T06:29:59Z

@Laktus, the problem is that in Python, you have to use the MultiModalVectorStoreIndex to use images

So I would start replacing VectorStoreIndex with this class.

Details about using it are here:
https://docs.llamaindex.ai/en/stable/examples/evaluation/multi_modal/multi_modal_rag_evaluation/?h=multimodalvectorstoreindex

If you like, you're welcome to post a diff of your code here.

Laktus · 2024-05-10T07:50:32Z

@marcusschiesser But this saves the images into the vector DB or not? I don't want to populate the vector DB with the image information, but only want to attach the image to one message.

In the Vision Docs of OpenAI (https://platform.openai.com/docs/guides/vision) you can see the following possibilities of the completion API

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-turbo",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

The astream_chat only accepts the message as a raw str. If we could directly pass a ChatMessage object then weit should be possible to add this additional information to the API call below or not? Why is this not supported out of the box? I think the TS version also solves it in this way.

marcusschiesser · 2024-05-15T15:14:17Z

@Laktus yes, this is a current issue of the Python version. We're working on aligning the multi-modal capabilities of the Python and the Typescript version. Once that's done, we will add image upload support to the FastAPI backend

Laktus · 2024-07-28T11:13:27Z

@marcusschiesser Is there any update on when this will be implemented?

marcusschiesser · 2024-07-29T08:21:29Z

Not yet, we'll need multi-modal support in the Python framework first

marcusschiesser assigned leehuwuj Oct 8, 2024

marcusschiesser mentioned this issue Oct 16, 2024

WARNING: The annotation image is not supported for generating context content ragapp/ragapp#228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiModal does not work with Next.js Frontend and FastAPI backend #65

MultiModal does not work with Next.js Frontend and FastAPI backend #65

Laktus commented Apr 25, 2024

marcusschiesser commented Apr 26, 2024

Laktus commented Apr 26, 2024 •

edited

Loading

marcusschiesser commented May 3, 2024

Laktus commented May 5, 2024

Laktus commented May 9, 2024

marcusschiesser commented May 10, 2024

Laktus commented May 10, 2024

marcusschiesser commented May 15, 2024

Laktus commented Jul 28, 2024

marcusschiesser commented Jul 29, 2024

MultiModal does not work with Next.js Frontend and FastAPI backend #65

MultiModal does not work with Next.js Frontend and FastAPI backend #65

Comments

Laktus commented Apr 25, 2024

marcusschiesser commented Apr 26, 2024

Laktus commented Apr 26, 2024 • edited Loading

marcusschiesser commented May 3, 2024

Laktus commented May 5, 2024

Laktus commented May 9, 2024

marcusschiesser commented May 10, 2024

Laktus commented May 10, 2024

marcusschiesser commented May 15, 2024

Laktus commented Jul 28, 2024

marcusschiesser commented Jul 29, 2024

Laktus commented Apr 26, 2024 •

edited

Loading