Using Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama, chat message raised error message #8777

HorseLuke · 2024-09-25T15:27:31Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Deploy ollama 0.3.12
Pull Vision model "minicpm-v:8b-2.6-q4_K_M"

ollama run minicpm-v:8b-2.6-q4_K_M

Go to dify 0.8.3 -> add new provider, add this vision model with ollama definition.
Create blank app using this vision model.
In "Studio" "Orchestrate" page, change vision setting -> Resolution to Low.
Upload image, then chat in "Debug and Preview"

✔️ Expected Behavior

Chat will output normal from start to end.

❌ Actual Behavior

Chat output normal at first time, but then stop and clear output text, then it raised error at the top right of the page, saying:

Error: Input [TextPromptMessageContent(type=<PromptMessageContentType.TEXT: 'text'>, data='图片表述什么内容？'), ImagePromptMessageContent(type=<PromptMessageContentType.IMAGE: 'image'>, data='data:image/jpeg;base64,/9j/4AAQSkZJRgA...[ignore]...', detail=<DETAIL.LOW: 'low'>)] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

Attach is from docker logs after setting DEBUG=true

dify-ollama-error-minicpm-v.zip

The text was updated successfully, but these errors were encountered:

dosubot · 2024-09-25T15:29:47Z

Hey @HorseLuke! 👋 I'm here to help you with bug fixes, answer your questions, and guide you to become a contributor. Let's squash those bugs together!

I found two similar discussions that might be relevant to your issue:

Adding a local ollama model error which was last updated on 2024-09-12.
Adding a local ollama model error which was last updated on 2024-09-12.

To resolve the error "Input [TextPromptMessageContent(type=<PromptMessageContentType.TEXT: 'text'>, data='图片表述什么内容？'), ImagePromptMessageContent(type=<PromptMessageContentType.IMAGE: 'image'>, data='data:image/jpeg;base64,/9j/4AAQSkZJRgA...[ignore]...', detail=<DETAIL.LOW: 'low'>)] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers" when using the Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama in dify 0.8.3, you need to ensure that the prompt_messages are correctly formatted.

The _generate method in the OllamaLargeLanguageModel class expects the prompt_messages to be converted into a dictionary format that the API can understand. Specifically, the content of UserPromptMessage should be a string or a list/tuple of strings or integers.

Here is the relevant part of the code that handles this conversion:

def _convert_prompt_message_to_dict(self, message: PromptMessage) -> dict:
    """
    Convert PromptMessage to dict for Ollama API
    """
    if isinstance(message, UserPromptMessage):
        message = cast(UserPromptMessage, message)
        if isinstance(message.content, str):
            message_dict = {"role": "user", "content": message.content}
        else:
            text = ""
            images = []
            for message_content in message.content:
                if message_content.type == PromptMessageContentType.TEXT:
                    message_content = cast(TextPromptMessageContent, message_content)
                    text = message_content.data
                elif message_content.type == PromptMessageContentType.IMAGE:
                    message_content = cast(ImagePromptMessageContent, message_content)
                    image_data = re.sub(r"^data:image\/[a-zA-Z]+;base64,", "", message_content.data)
                    images.append(image_data)
            message_dict = {"role": "user", "content": text, "images": images}
    elif isinstance(message, AssistantPromptMessage):
        message = cast(AssistantPromptMessage, message)
        message_dict = {"role": "assistant", "content": message.content}
    elif isinstance(message, SystemPromptMessage):
        message = cast(SystemPromptMessage, message)
        message_dict = {"role": "system", "content": message.content}
    else:
        raise ValueError(f"Got unknown type {message}")

    return message_dict

Ensure that your prompt_messages are structured correctly before invoking the model. For example:

prompt_messages = [
    UserPromptMessage(content=[
        TextPromptMessageContent(type=PromptMessageContentType.TEXT, data='图片表述什么内容？'),
        ImagePromptMessageContent(type=PromptMessageContentType.IMAGE, data='data:image/jpeg;base64,/9j/4AAQSkZJRgA...')
    ])
]

This will be converted to:

{
    "role": "user",
    "content": "图片表述什么内容？",
    "images": ["base64_encoded_image_data"]
}

Make sure that the content field is a string or a list/tuple of strings or integers, and the images field contains the base64 encoded image data ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

HorseLuke · 2024-09-25T15:46:49Z

@dosu Based on your context, I can not find anything that can change content field in "Orchestrate" page.

However, if I add a sentense in "Instructions" textarea, then the error disappear. "Instructions" textarea is described as these text:

Prompts guide AI responses with instructions and constraints. Insert variables like {{input}}. This prompt won't be visible to users.

What does "Instructions" textarea used in content field? Is this a bug when "Instructions" textarea is empty?

dosubot · 2024-09-25T15:49:37Z

The "Instructions" textarea in the content field is used to input and display the opening statement. It is not a bug when the "Instructions" textarea is empty. The code handles both cases where the textarea has content and where it is empty, providing appropriate placeholders and UI elements for each scenario.

Here is the relevant code snippet that demonstrates this:

const value = openingStatement?.opening_statement || '';
const hasValue = !!(value || '').trim();
...
<div className='text-gray-700 text-sm'>
  {(hasValue || (!hasValue && isFocus)) ? (
    <>
      {isFocus
        ? (
          <div>
            <textarea
              ref={inputRef}
              value={tempValue}
              rows={3}
              onChange={e => setTempValue(e.target.value)}
              className="w-full px-0 text-sm  border-0 bg-transparent focus:outline-none "
              placeholder={t('appDebug.openingStatement.placeholder') as string}
            >
            </textarea>
          </div>
        )
        : (
          <div dangerouslySetInnerHTML={{
            __html: coloredContent,
          }}></div>
        )}
      {renderQuestions()}
    </>) : (
    <div className='pt-2 pb-1 text-xs text-gray-500'>{t('appDebug.openingStatement.noDataPlaceHolder')}</div>
  )}
</div>

The code checks if there is a value in the opening_statement and handles the UI accordingly, showing a placeholder when the textarea is empty ^[1].

Additionally, in the SegmentDetailComponent from the Dify platform, the AutoHeightTextarea component is used to handle text input fields, including scenarios where the content might be empty. This further confirms that the application is designed to handle empty textareas gracefully ^[2].

To continue talking to Dosu, mention @dosu.

Kimdow220807 · 2024-09-26T06:59:15Z

I also encountered the same problem

dosubot bot added 🐞 bug Something isn't working 🤔 cant-reproduce We can not reproduce it or the information is limited labels Sep 25, 2024

crazywoola mentioned this issue Sep 26, 2024

fix: Ollama modelfeature set vision, and an exception occurred at the… #8783

Merged

12 tasks

crazywoola closed this as completed in #8783 Sep 26, 2024

crazywoola mentioned this issue Sep 27, 2024

Issue with DIFY When Orchestrating Workflows Involving Visual VL Models and Standard Text Models #8824

Closed

5 tasks

crazywoola reopened this Sep 27, 2024

crazywoola closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama, chat message raised error message #8777

Using Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama, chat message raised error message #8777

HorseLuke commented Sep 25, 2024

dosubot bot commented Sep 25, 2024

HorseLuke commented Sep 25, 2024

dosubot bot commented Sep 25, 2024

Kimdow220807 commented Sep 26, 2024

Using Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama, chat message raised error message #8777

Using Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama, chat message raised error message #8777

Comments

HorseLuke commented Sep 25, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Sep 25, 2024

HorseLuke commented Sep 25, 2024

dosubot bot commented Sep 25, 2024

Kimdow220807 commented Sep 26, 2024