-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama-3-VILA1.5-8B Inference error #39
Comments
could @joebradly @SeanCraven314 share your environemnt? The code runs without error on myside. |
Hi this is a dump of my environment, I am launching from the cli: python llava/eval/run_vila.py \
--model-path=Efficient-Large-Model/Llama-3-VILA1.5-8B \
--image-file=test.jpg\
--query "What is this?"
Pip list
Running on intel and A100 on Ubuntu 22.04 |
Package Version Editable project location accelerate 0.27.2 |
I change line 272 to the following: |
感谢楼上 |
Will verify and fix. BTW, you need to use |
It seems when using the correct conv mode, there is no issue. Therefore, no code change is needed. |
Thanks very much for this. Sorry for the hassle. |
Hi, run_vila.py with VILA1.5-40B (not llama-3) will encounter the same issue. Use the workaround from @joebradly will fix it. |
For VILA1.5-40B, you should use |
Hi, I am curious about the setting of the |
Yes, the model is trained w/ those conv_mode, in theory, we should bake this parameter into the model config and don't let user change it. |
Different |
I try LONGVILA model with this file, only this works for me. |
Hello! Thanks for sharing such a nice project.
I have set up environment following the instructions in ReadME.
When I run the inference example as the following ( i have copy the run_vila.py file from llava/eval/ to the current project root):
'''bash
python run_vila.py
--model-path Efficient-Large_model/Llama-3-VILA1.5-8B
--conv-mode vicuna_v1
--query "\n Please describe the traffic condition."
--image-file "./demo_images/av.png"
'''
I encounter the following error:
'''
['./demo_images/av.png']
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [01:46<05:18, 106.09s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [03:47<03:49, 114.88s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [05:02<01:37, 97.03s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [05:13<00:00, 62.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [05:13<00:00, 78.34s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's
attention_mask
to obtain reliable results.Setting
pad_token_id
toeos_token_id
:128001 for open-end generation.input: \n Please describe the traffic condition.
[WARNING] the auto inferred conversation mode is llava_v0, while
--conv-mode
is vicuna_v1, using vicuna_v1torch.Size([1, 3, 384, 384])
Traceback (most recent call last):
File "/home/deping.zhang/code/llm/VILA/run_vila.py", line 153, in
eval_model(args)
File "/home/deping.zhang/code/llm/VILA/run_vila.py", line 115, in eval_model
output_ids = model.generate(
File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/deping.zhang/code/llm/VILA/llava/model/language_model/llava_llama.py", line 171, in generate
outputs = self.llm.generate(
File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
return self.sample(
File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/utils.py", line 2924, in sample
if stopping_criteria(input_ids, scores):
File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 132, in call
return any(criteria(input_ids, scores) for criteria in self)
File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 132, in
return any(criteria(input_ids, scores) for criteria in self)
File "/home/deping.zhang/code/llm/VILA/llava/mm_utils.py", line 287, in call
outputs.append(self.call_for_batch(output_ids[i].unsqueeze(0), scores))
File "/home/deping.zhang/code/llm/VILA/llava/mm_utils.py", line 272, in call_for_batch
if (output_ids[0, -keyword_id.shape[0] :] == keyword_id).all():
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0
'''
The text was updated successfully, but these errors were encountered: