Fix RecurrentGemma device_map #30273

SunMarc · 2024-04-16T14:48:53Z

What does this PR do ?

This PR makes gemma compatible with multi-gpu device_map. To try out:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/recurrentgemma-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/recurrentgemma-2b-it", device_map="auto"
)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,use_cache=True)
print(tokenizer.decode(outputs[0]))

I get the same output in the single gpu or multi gpu setup.

SunMarc · 2024-04-16T14:53:18Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

@@ -252,7 +252,7 @@ def _update_cache(self, key_states, value_states, **cache_kwargs):
            to_shift = cache_position >= self.config.attention_window_size - 1
            indices = (slicing + to_shift[-1].int() - 1) % self.config.attention_window_size

-            k_out, v_out = self.key_states, self.value_states
+            k_out, v_out = self.key_states.to(key_states.device), self.value_states.to(value_states.device)


Due to _setup_cache, self.key_states and self.value_states are initialized on the device of the hidden state that we pass to the model in generate (e.g. cuda:0). However, this layer might not be on the same device as the hidden state if we use multi-gpu. Hence, we need to make sure that self.key_states is on the same device as key_states. Same for value_states.

SunMarc · 2024-04-16T14:54:10Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

+                contextualized_states = recurrent_gate.type(acc_dtype) * recurrent_states[:, None].to(
+                    recurrent_gate.device
+                )


Same issue with recurrent_gate which is initialized in _setup_cache.

SunMarc · 2024-04-16T14:54:28Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

@@ -387,7 +389,7 @@ def _rnn_scan(

            contextualized_states = torch.zeros_like(hidden_states)
            for t in range(hidden_states.shape[1]):
-                recurrent_states = recurrent_gate[:, t].type(acc_dtype) * recurrent_states
+                recurrent_states = recurrent_gate[:, t].type(acc_dtype) * recurrent_states.to(recurrent_gate.device)


Here also !

SunMarc · 2024-04-16T14:55:04Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

+        self.register_buffer(
+            "normalizer", torch.tensor(self.config.hidden_size**0.5, dtype=torch.bfloat16), persistent=False
+        )


We don't need this to be persistant. This fixes an issue that we get with accelerate too.

HuggingFaceDocBuilderDev · 2024-04-16T15:12:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks, the device thing could be fixed by placing them on the same device as self.key_states? rather than the device passed?
Also tad bit scared of the slow down of doing it there? But LGTM otherwise

ArthurZucker · 2024-04-16T19:23:28Z

src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py

+        self.register_buffer(
+            "normalizer", torch.tensor(self.config.hidden_size**0.5, dtype=torch.bfloat16), persistent=False
+        )


SunMarc · 2024-04-17T12:01:23Z

Thanks, the device thing could be fixed by placing them on the same device as self.key_states? rather than the device passed? Also tad bit scared of the slow down of doing it there? But LGTM otherwise

I think it will slow down if why place them on the same device as self.key_states for example. Let's say self.key_states is initialized on cuda:0 and we have 2 gpus. The problem is that the computed key_states can be on cuda:0 or cuda:1 depending on where the layer is. Hence, it is better to move self.key_states to the device of key_states to limit data transfert between gpus. Otherwise, we will need to move the data each time we have a layer in cuda:1.

ArthurZucker

Thanks

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

SunMarc added 3 commits April 16, 2024 15:32

Switch to non persistant buffer

7d0595d

fix device mismatch issue due to cache

b95c44e

style

3b6d19c

SunMarc requested a review from ArthurZucker April 16, 2024 14:48

SunMarc changed the title ~~Fix recurrent gemma device_map~~ Fix RecurrentGemma device_map Apr 16, 2024

SunMarc commented Apr 16, 2024

View reviewed changes

ArthurZucker reviewed Apr 16, 2024

View reviewed changes

ArthurZucker approved these changes Apr 18, 2024

View reviewed changes

ArthurZucker merged commit 7509a0a into huggingface:main Apr 18, 2024
19 checks passed

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Apr 18, 2024

Fix RecurrentGemma device_map (huggingface#30273)

4b28662

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024

Fix RecurrentGemma device_map (#30273)

21a9df9

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

ydshieh pushed a commit that referenced this pull request Apr 23, 2024

Fix RecurrentGemma device_map (#30273)

005b9ec

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

itazap pushed a commit that referenced this pull request May 14, 2024

Fix RecurrentGemma device_map (#30273)

024e5e6

* Switch to non persistant buffer * fix device mismatch issue due to cache * style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RecurrentGemma device_map #30273

Fix RecurrentGemma device_map #30273

SunMarc commented Apr 16, 2024

SunMarc Apr 16, 2024 •

edited

Loading

SunMarc Apr 16, 2024

SunMarc Apr 16, 2024

SunMarc Apr 16, 2024

ArthurZucker Apr 16, 2024

HuggingFaceDocBuilderDev commented Apr 16, 2024

ArthurZucker left a comment

ArthurZucker Apr 16, 2024

SunMarc commented Apr 17, 2024

ArthurZucker left a comment

Fix RecurrentGemma device_map #30273

Fix RecurrentGemma device_map #30273

Conversation

SunMarc commented Apr 16, 2024

What does this PR do ?

SunMarc Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

SunMarc Apr 16, 2024

Choose a reason for hiding this comment

SunMarc Apr 16, 2024

Choose a reason for hiding this comment

SunMarc Apr 16, 2024

Choose a reason for hiding this comment

ArthurZucker Apr 16, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 16, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Apr 16, 2024

Choose a reason for hiding this comment

SunMarc commented Apr 17, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

SunMarc Apr 16, 2024 •

edited

Loading