You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> > LLaVA-1.5 uses 336px image resolution, so you should change the clip model and control max context length. Also, the image token length is set to 256 by default, but when the resolution is changed to 336, the image token length should be set to 576. Overall, some implementation details need further consideration to adapt to llava-1.5. You should check that in detail.
#163
Open
Amark-cheey opened this issue
Nov 21, 2024
· 1 comment
> > LLaVA-1.5 uses 336px image resolution, so you should change the clip model and control max context length. Also, the image token length is set to 256 by default, but when the resolution is changed to 336, the image token length should be set to 576. Overall, some implementation details need further consideration to adapt to llava-1.5. You should check that in detail.
The use of flash-attn should not affect the final performance.
I used these settings in LLaVA 1.5, but there are still some errors in certain parts of the configuration. May I ask for some guidance? pred_embeddings = last_hidden_state[seg_token_mask] [rank0]: IndexError: The shape of the mask [8, 348] at index 1 does not match the shape of the indexed tensor [8, 668, 336] at index 1
l are trying to change 255 to 575 ,running successfully
l are trying to change 255 to 575 ,running successfully
Originally posted by @bxhsort in #82 (comment)
The text was updated successfully, but these errors were encountered: