-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Llava chat template examples #30130
Conversation
@@ -388,16 +389,16 @@ def forward( | |||
>>> model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf") | |||
>>> processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf") | |||
|
|||
>>> prompt = "<image>\nUSER: What's the content of the image?\nASSISTANT:" | |||
>>> prompt = "USER: <image>\nWhat's the content of the image? ASSISTANT:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this prompt put the <image>
token in the wrong place altogether.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -43,13 +43,13 @@ The original code can be found [here](https://github.com/haotian-liu/LLaVA/tree/ | |||
- For better results, we recommend users to prompt the model with the correct prompt format: | |||
|
|||
```bash | |||
"USER: <image>\n<prompt>ASSISTANT:" | |||
"USER: <image>\n<prompt> ASSISTANT:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we should prepend the system prompt here, but I think it can be excluded for the purposes of keeping the demo simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! FYI @ArthurZucker @younesbelkada
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing these!
>>> processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] | ||
"\nUSER: What's the content of the image?\nASSISTANT: The image features a stop sign on a street corner" | ||
"USER: \nWhat's the content of the image? ASSISTANT: The image features a busy city street with a stop sign prominently displayed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting to have this double spacing here "USER: \nWhat's
? We're skipping <image>
but from the prompt I would have expected: USER: \nWhat's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did find this a bit weird, but it seems to be the default behaviour of the decoding
What does this PR do?
This PR fixes a few minor issues in the chat templating examples for
llava
models. In particular, looking at the inference code from the original Llava repo, one sees that the correct template is:while we had:
I've only fixed the
llava-v1.5
parts of the code, but happy to extend the PR to cover the other models if we agree with the change.Code to reproduce the expected template from the Llava repo:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
cc @NielsRogge @amyeroberts