Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dynamic content types like audio and images #34

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

davidberenstein1957
Copy link
Collaborator

Should be merged after #33

@@ -64,7 +67,7 @@ def duckdb_schema(self):
id VARCHAR PRIMARY KEY,
model VARCHAR,
timestamp TIMESTAMP,
messages STRUCT(role VARCHAR, content VARCHAR)[],
Copy link
Collaborator Author

@davidberenstein1957 davidberenstein1957 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfahlgren1, currently other modalities, like the openai_vision_example.py, do not work with the current message schema.

I am not too familiar with DuckDB but can you have a look at this schema definiton? The input is a bit dynaminc and I am not sure how to best tackle this besides converting to JSON (losing the nice array queryability etc).

tests/integration/observers/test_observers_examples.py hold some of the potential types we could be expecting and it should work for testing them.

[
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "What's in this image?",
          },
          {
            type: "image_url",
            image_url: {
              url: imageUrl,
            },
          },
        ],
      },
]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah very good point. Ollama has this format for images for their client library for vision

"messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }

Copy link
Owner

@cfahlgren1 cfahlgren1 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for it to be a DuckDB Struct each row would have to have the same amount of keys.

The key difference is that DuckDB STRUCTs require the same keys in each row of a STRUCT column

We may have to make it a JSON type

Copy link
Collaborator Author

@davidberenstein1957 davidberenstein1957 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfahlgren1 Or we can rewrite the message structure to follow the following structure? I think they can be used interchangeably according to the typing. Then we avoid losing the queryability of the arrays. WDYT?
{"type": "text", "text": "hello", "image_url": None}
{"type": "text", "text": None, "image_url": "datauri"}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the message formats. e.g.

ChatCompletionSystemMessageParam(content=""),
ChatCompletionUserMessageParam(
    content=ChatCompletionContentPartTextParam(text="test")
),
ChatCompletionAssistantMessageParam(content=""),
ChatCompletionUserMessageParam(
    content=ChatCompletionContentPartImageParam(image_url="image")
),
ChatCompletionAssistantMessageParam(content=""),
ChatCompletionUserMessageParam(
    content=ChatCompletionContentPartInputAudioParam(
        input_audio=InputAudio(data="audio", format="wav")
    )
),
ChatCompletionFunctionMessageParam(content=""),
ChatCompletionToolMessageParam(content="")

# the following are equal, more or less

ChatCompletionUserMessageParam(
    content=ChatCompletionContentPartTextParam(text="test")
) 
ChatCompletionUserMessageParam(
    content="test"
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the down side, but querying etc would be easier I think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that is fine and gets most of the benefits of having it in a sql store like that. Obviously when being send to HuggingFace we won't need to send image url etc if there are none.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DuckDB union could be valuable here, but not sure if it would make anything more complicated

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One the schema changes topic, how do we want to make things backwards compatible or for upgrading existing duckdb instances to work with new version?

One idea is some type of migrations to update columns etc? But don't want to add to much complexity 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have the same problem with SQLite store. It may be better to just store as JSON. It adds a step (CASTING to STRUCT) for querying locally but I think it's a worthy trade off in favor of not introducing a lot of complexity or maintenence

@davidberenstein1957 davidberenstein1957 changed the base branch from main to develop December 16, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants