You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
pytest -rA tests/test_cache_utils.py::CacheIntegrationTest -k "test_static_cache_greedy_decoding_pad_left and flash_attention"
fails with
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.LongTensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_value: Optional[Cache] = None,
output_attentions: bool = False,
use_cache: bool = False,
cache_position: Optional[torch.LongTensor] = None,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
if isinstance(past_key_value, StaticCache):
> raise ValueError(
"`static` cache implementation is not compatible with `attn_implementation==flash_attention_2`""make sure to use `sdpa` in the mean time, and open an issue at https://github.com/huggingface/transformers"
)
E ValueError: `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` make sure to use `sdpa`in the mean time, and open an issue at https://github.com/huggingface/transformers
src/transformers/models/llama/modeling_llama.py:388: ValueError
And the right padding test case also fails:
pytest -rA tests/test_cache_utils.py::CacheIntegrationTest -k "test_static_cache_greedy_decoding_pad_right and flash_attention"
Expected behavior
Either we don't test flash_attention in this case, or we should add a if check to skip setting cache_implementation to static.
The text was updated successfully, but these errors were encountered:
System Info
transformers
version: 4.43.0.dev0Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
pytest -rA tests/test_cache_utils.py::CacheIntegrationTest -k "test_static_cache_greedy_decoding_pad_left and flash_attention"
fails with
And the right padding test case also fails:
pytest -rA tests/test_cache_utils.py::CacheIntegrationTest -k "test_static_cache_greedy_decoding_pad_right and flash_attention"
Expected behavior
Either we don't test
flash_attention
in this case, or we should add a if check to skip settingcache_implementation
tostatic
.The text was updated successfully, but these errors were encountered: