> > LLaVA-1.5 uses 336px image resolution, so you should change the clip model and control max context length. Also, the image token length is set to 256 by default, but when the resolution is changed to 336, the image token length should be set to 576. Overall, some implementation details need further consideration to adapt to llava-1.5. You should check that in detail. #163

Amark-cheey · 2024-11-21T11:30:45Z

          > > LLaVA-1.5 uses 336px image resolution, so you should change the clip model and control max context length. Also, the image token length is set to 256 by default, but when the resolution is changed to 336, the image token length should be set to 576. Overall, some implementation details need further consideration to adapt to llava-1.5. You should check that in detail.

The use of flash-attn should not affect the final performance.

I used these settings in LLaVA 1.5, but there are still some errors in certain parts of the configuration. May I ask for some guidance? pred_embeddings = last_hidden_state[seg_token_mask] [rank0]: IndexError: The shape of the mask [8, 348] at index 1 does not match the shape of the indexed tensor [8, 668, 336] at index 1

l are trying to change 255 to 575 ,running successfully

Originally posted by @bxhsort in #82 (comment)

The text was updated successfully, but these errors were encountered:

dohyun1411 · 2024-12-09T19:01:38Z

Hi, should we change this truncate_len, too?

LISA/utils/dataset.py

Line 138 in dbe026a

truncate_len = tokenizer.model_max_length - 255

So,
truncate_len = tokenizer.model_max_length - 255
-> truncate_len = tokenizer.model_max_length - 575

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amark-cheey commented Nov 21, 2024

dohyun1411 commented Dec 9, 2024 •

edited

Loading

Comments

Amark-cheey commented Nov 21, 2024

dohyun1411 commented Dec 9, 2024 • edited Loading

dohyun1411 commented Dec 9, 2024 •

edited

Loading