Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window #28037

gante · 2023-12-14T13:46:44Z

What does this PR do?

The FA2 code path was indexing the Cache object incorrectly. This PR fixes it.

NOTE: tests/models/mistral/test_modeling_mistral.py::MistralIntegrationTest::test_model_7b_long_prompt (slow test) was failing on main, but it was not popping up in our daily slow CI 🤔 because of that, this issue flew under the radar. It is passing now.

Edit: the test was not run because we are skipping FA2 tests (@require_flash_attn). @ydshieh is on it :)

gante · 2023-12-14T13:48:25Z

src/transformers/models/mistral/modeling_mistral.py

@@ -385,11 +385,16 @@ def forward(

        if past_key_value is not None:


context: when use_cache is True, past_key_value is now a Cache object even if it is an empty cache (previously it was None).

As such, with a slicing window, we need to check whether the cache has contents before attempting to slice, as we can't slice None.

gante · 2023-12-14T13:49:20Z

src/transformers/models/mistral/modeling_mistral.py

@@ -400,8 +405,6 @@ def forward(
                        f" {past_key.shape}"
                    )

-                past_key_value = (past_key, past_value)


past_key_value is now a Cache instance that is updated in place with the .update() function (L413 in the updated file). We don't need to set it.

tomaarsen · 2023-12-14T14:18:23Z

src/transformers/models/mistral/modeling_mistral.py

                slicing_tokens = 1 - self.config.sliding_window

-                past_key = past_key_value[0]
-                past_value = past_key_value[1]
+                past_key = past_key_value[self.layer_idx][0]


self.layer_idx is guaranteed to be defined here, right?

Nope, good catch! Going to add an appropriate exception.

tomaarsen · 2023-12-14T14:18:40Z

This indeed seems to address what was described here, well done!

amyeroberts

Thanks for fixing!

Looking forward to the FA2 tests being run 😅

HuggingFaceDocBuilderDev · 2023-12-14T14:47:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2023-12-14T14:49:00Z

As @younesbelkada mentioined and in the official site

Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.

nothing we can't do unless we run on a different machine

… window (huggingface#28037)

… window (#28037)

… window (huggingface#28037)

gante added 7 commits December 14, 2023 11:25

add breakpoint

aacffa8

only slice if the cache has contents

70cc502

get the right slice

564750c

.

f038066

.

3f24fb0

tmp commit

d4867f7

fix FA2 + mis(x)tral

cf7cc6d

gante commented Dec 14, 2023

View reviewed changes

gante requested a review from amyeroberts December 14, 2023 14:07

tomaarsen reviewed Dec 14, 2023

View reviewed changes

add self.layer_idx exception (as suggested by @tomaarsen)

fbfb543

amyeroberts approved these changes Dec 14, 2023

View reviewed changes

gante merged commit 388fd31 into huggingface:main Dec 14, 2023
18 checks passed

gante deleted the mixtral_cache branch December 14, 2023 14:52

iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context…

8330fab

… window (huggingface#28037)

amyeroberts pushed a commit that referenced this pull request Dec 18, 2023

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context…

f33b061

… window (#28037)

staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context…

cab7ef5

… window (huggingface#28037)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window #28037

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window #28037

gante commented Dec 14, 2023 •

edited

Loading

gante Dec 14, 2023

gante Dec 14, 2023 •

edited

Loading

tomaarsen Dec 14, 2023

gante Dec 14, 2023

tomaarsen commented Dec 14, 2023

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Dec 14, 2023

ydshieh commented Dec 14, 2023

		@@ -385,11 +385,16 @@ def forward(

		if past_key_value is not None:

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window #28037

Generate: Mistral/Mixtral FA2 cache fix when going beyond the context window #28037

Conversation

gante commented Dec 14, 2023 • edited Loading

What does this PR do?

gante Dec 14, 2023

Choose a reason for hiding this comment

gante Dec 14, 2023 • edited Loading

Choose a reason for hiding this comment

tomaarsen Dec 14, 2023

Choose a reason for hiding this comment

gante Dec 14, 2023

Choose a reason for hiding this comment

tomaarsen commented Dec 14, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 14, 2023

ydshieh commented Dec 14, 2023

gante commented Dec 14, 2023 •

edited

Loading

gante Dec 14, 2023 •

edited

Loading