You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, it seems like batched inference is broken when flash_attention is used. When running inference on the 1st example of ScreenSpot test example with flash_attention_2, the output changes depending on the batch size
Hi, it seems like batched inference is broken when flash_attention is used. When running inference on the 1st example of ScreenSpot test example with flash_attention_2, the output changes depending on the batch size
Batch_size = 1:
<|object_ref_start|>close button<|object_ref_end|><|box_start|>(954,148),(988,196)<|box_end|><|im_end|>
Batch_size = 4:
降序<|im_end|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>
When I disable flash attention_2, the results look fine.
Batch_size = 1:
<|object_ref_start|>close button<|object_ref_end|><|box_start|>(954,148),(988,196)<|box_end|><|im_end|>
Batch_size = 4:
<|object_ref_start|>close button<|object_ref_end|><|box_start|>(954,148),(988,196)<|box_end|><|im_end|>
Flash attention with batch_size=1 is fast enough so this bug is not a deal breaker for me, although it'd be nice if this is addressed.
The text was updated successfully, but these errors were encountered: