Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openAI API not supporting Image processing #1562

Open
AhzamHassan opened this issue Nov 18, 2024 · 2 comments
Open

openAI API not supporting Image processing #1562

AhzamHassan opened this issue Nov 18, 2024 · 2 comments
Labels
support Questions about how to do something

Comments

@AhzamHassan
Copy link

I am trying to upload an image to get response from the openAI using its API but the response says, i am unable to process images.
my code:

const completion = await openai.chat.completions.create({
      model: "gpt-4o",
messages: [
        {
          role: "system",
          content: "You are a helpful assistant, i am sharing an image with you please give me the solution for this math problem.",
        },
        {
          role: "user",
          content: JSON.stringify({
            type: "image_url",
            image_url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          }),
        },
      ],
    });

Response (Postman):

{
    "data": {
        "role": "assistant",
        "content": "I'm sorry, but as a text-based AI, I'm unable to view or interpret images. However, if you describe the math problem to me or type it out, I'd be more than happy to assist you in solving it.",
        "refusal": null
    },
    "message": "Success",
    "success": true
}
@AhzamHassan AhzamHassan added the support Questions about how to do something label Nov 18, 2024
@erenakbay
Copy link

The GPT-4 API doesn’t support image processing directly, as it handles only text inputs. To resolve this, use an OCR tool like Tesseract to extract text from the image, then pass the extracted text to the GPT-4 API for analysis or problem-solving.

@Andyple
Copy link

Andyple commented Dec 9, 2024

GPT-4o does accept images as input. Try this code and see if this works.

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a helpful assistant, i am sharing an image with you please give me the solution for this math problem."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        },
        {
          "type": "text",
          "text": "Explain this image"
        }
      ]
    }
  ],
  response_format={
    "type": "text"
  },
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Questions about how to do something
Projects
None yet
Development

No branches or pull requests

3 participants