Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "gpt-4-vision-preview" as default parser #1417

Merged
merged 3 commits into from
Mar 8, 2024
Merged

Add "gpt-4-vision-preview" as default parser #1417

merged 3 commits into from
Mar 8, 2024

Conversation

rholinshead
Copy link
Contributor

@rholinshead rholinshead commented Mar 7, 2024

Add "gpt-4-vision-preview" as default parser

Add "gpt-4-vision-preview" as default parser

Summary: Just adding this model/parser as a default alongside the other openai defaults

Test Plan: Can use the parser in the editor


Stack created with Sapling. Best reviewed with ReviewStack.

# Update openai to 1.13.3


Summary:
Updating the openai version to latest (1.13.3) so that we can use the image messages for gpt-4-v in the next PR

Test Plan:
- pytest works without errors
- WIP: Testing all relevant cookbooks
Comment on lines +46 to +48
ModelParserRegistry.register_model_parser(
OpenAIVisionParser("gpt-4-vision-preview")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tanya-rai this would be the 3rd position from drop-down. If you want to change it, we can easily change in a future PR

Ryan Holinshead added 2 commits March 8, 2024 13:34
# Implement OpenAIVisionParser

Summary: Using the default openai parser as the basis (and extending it), implement a parser for text-and-image openai models (e.g. gpt-4-v). The key differences are:
- vision models don't support function calling or tools, so exclude from the completion params
- serialize/deserialize need to be a bit different to construct the user messages' content with text and image urls

One thing to note is the default max tokens for GPT-4V is quite small. Bumping it to 100 in #1419

Another thing I found while testing is that the model gets confused sometimes when you refer to the attached image as an image. For example, I had a prompt as "What is this image of" and it responded with "I can't provide information on any images until you upload one". Then I reran a handful of times and it finally gave expected result. Changing the prompt to "What is this?" seems to provide much better results.

Test Plan:

https://github.com/lastmile-ai/aiconfig/assets/5060851/47e0ae90-fab7-4d4f-98e9-163cd992aeed
# Add "gpt-4-vision-preview" as default parser


Summary: Just adding this model/parser as a default alongside the other openai defaults

Test Plan: Can use the parser in the editor
@rholinshead rholinshead merged commit c22e607 into main Mar 8, 2024
2 checks passed
rholinshead added a commit that referenced this pull request Mar 8, 2024
Default GPT-4V max_tokens to 100

# Default GPT-4V max_tokens to 100

Summary: The default max_tokens is really low (like < 20) so bump to 100
to make it more useable

Test Plan:
![Screenshot 2024-03-07 at 5 38 27
PM](https://github.com/lastmile-ai/aiconfig/assets/5060851/085c9e68-9521-450e-b2cb-699a09eead9b)

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1419).
* __->__ #1419
* #1418
* #1417
* #1416
* #1415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants