FIX: Prefix tuning with model on multiple devices #2189

BenjaminBossan · 2024-10-30T16:07:38Z

After introducing the usage of DynamicCache for prefix tuning, a bug could now occur if the model is dispatched to different devices. This is because we need to move the key and value cache for each layer to that layer's respective device.

The new code mostly consists of code copied from transformers to be consistent with how transformers solves this.

Note that this only works if the hf_device_map attribute is set on the model.

See huggingface#2134 After introducing the usage of DynamicCache for prefix tuning, a bug could now occur if the model is dispatched to different devices. This is because we need to move the key and value cache for each layer to that layer's respective device. The new code mostly consists of code copied from transformers to be consistent with how transformers solves this.

HuggingFaceDocBuilderDev · 2024-10-30T16:11:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-10-31T05:18:15Z

src/peft/utils/integrations.py

+    """
+    Ensure that the key and value cache of the model are on the same device as their corresponding layers.
+    """
+    if not (isinstance(cache, transformers.Cache) and hasattr(model, "hf_device_map")):


prob or not hastatt(model, "hf_device_map") otherwise we don't skip cases when the cache is tuple and model has no device map

Also I'm wondering if this didn't fail for tuple because accelerate handles device allocations for tensors, but can't handle for object? Since in tuple format we also don't do any device mapping usually

Hmm, maybe I misunderstand, but right now we have "not (A and B)", which is the same as "not A or not B".

oh right, I didn't see the brackets around I guess XD

zucchini-nlp · 2024-10-31T05:24:54Z

src/peft/utils/integrations.py

+        cache.key_cache[idx] = cache.key_cache[idx].to(layer_device)
+        cache.value_cache[idx] = cache.value_cache[idx].to(layer_device)


This might fail for encoder-decoder cache, can we also try with T5 model in multi-gpu environment to see if it works?

I tried and it didn't work because the get_layer_device_map function would fail. I tried to check why it doesn't fail when the function is called in transformers. But for T5, it is never reached. Honestly, I couldn't figure out why it's different for encoder-decoder. My solution for now is to call map_cache_to_layer_device_map only when peft_config.num_transformer_submodules == 1 and leave encoder-decoder untouched for now.

yes, sounds good to me, and we can add multi-gpu support for encoder-decoder as more models get converted to new cache format

zucchini-nlp

LGTM! Thanks for adding this

BenjaminBossan · 2024-10-31T13:37:23Z

@zucchini-nlp I did manage to make this work with T5, the fix was actually quite simple. Could you please review again?

zucchini-nlp

Awesome that it worked for T5! Thanks!

BenjaminBossan mentioned this pull request Oct 30, 2024

PEFT doesn't inject virtual tokens into generate forward pass #2134

Closed

4 tasks

zucchini-nlp reviewed Oct 31, 2024

View reviewed changes

BenjaminBossan added 2 commits October 31, 2024 13:30

Reviewer feedback: Only for decoder models

b573873

Forgot this line

350adec

zucchini-nlp approved these changes Oct 31, 2024

View reviewed changes

Add fix for encoder-decoder models

485fef7

zucchini-nlp approved these changes Oct 31, 2024

View reviewed changes

BenjaminBossan merged commit 5cda3a8 into huggingface:main Nov 1, 2024
14 checks passed

BenjaminBossan deleted the fix-prefix-tuning-with-device-map branch November 1, 2024 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Prefix tuning with model on multiple devices #2189

FIX: Prefix tuning with model on multiple devices #2189

BenjaminBossan commented Oct 30, 2024

HuggingFaceDocBuilderDev commented Oct 30, 2024

zucchini-nlp Oct 31, 2024

BenjaminBossan Oct 31, 2024

zucchini-nlp Oct 31, 2024

zucchini-nlp Oct 31, 2024

BenjaminBossan Oct 31, 2024

zucchini-nlp Oct 31, 2024

zucchini-nlp left a comment

BenjaminBossan commented Oct 31, 2024

zucchini-nlp left a comment

		cache.key_cache[idx] = cache.key_cache[idx].to(layer_device)
		cache.value_cache[idx] = cache.value_cache[idx].to(layer_device)

FIX: Prefix tuning with model on multiple devices #2189

FIX: Prefix tuning with model on multiple devices #2189

Conversation

BenjaminBossan commented Oct 30, 2024

HuggingFaceDocBuilderDev commented Oct 30, 2024

zucchini-nlp Oct 31, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 31, 2024

Choose a reason for hiding this comment

zucchini-nlp Oct 31, 2024

Choose a reason for hiding this comment

zucchini-nlp Oct 31, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 31, 2024

Choose a reason for hiding this comment

zucchini-nlp Oct 31, 2024

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Oct 31, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment