only move model to device when model is in cpu and target device is xpu #3133

faaany · 2024-09-29T13:35:43Z

What does this PR do?

When model is loaded across multiple devices, fine-tuning on XPU crashes with the message:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:0 and xpu:3! (when checking argument for argument mat2 in method wrapper_XPU__bmm)

The reason is that the whole model is moved back to xpu:0 in the _prepare_ipex_or_xpu method. After the fix, fine-tuning works.

Who can review?

@SunMarc and @muellerzr

faaany · 2024-09-30T00:32:25Z

@yao-matrix

yao-matrix · 2024-09-30T00:37:07Z

@yao-matrix

fine for me.

SunMarc

Make sense ! Thanks for fixing !

HuggingFaceDocBuilderDev · 2024-09-30T14:28:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr

Thanks for the fix!

bug fix

2471af3

faaany changed the title ~~fix tensor device misalignment on xpu for model loaded with device_map="auto"~~ only move model to device when model is in cpu and target device in xpu Sep 30, 2024

faaany changed the title ~~only move model to device when model is in cpu and target device in xpu~~ only move model to device when model is in cpu and target device is xpu Sep 30, 2024

SunMarc approved these changes Sep 30, 2024

View reviewed changes

muellerzr approved these changes Oct 7, 2024

View reviewed changes

muellerzr merged commit 1077611 into huggingface:main Oct 7, 2024
24 of 25 checks passed

faaany mentioned this pull request Oct 14, 2024

take torch.nn.Module model into account when moving to device #3167

Merged

faaany deleted the ipex-bug branch November 4, 2024 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only move model to device when model is in cpu and target device is xpu #3133

only move model to device when model is in cpu and target device is xpu #3133

faaany commented Sep 29, 2024

faaany commented Sep 30, 2024

yao-matrix commented Sep 30, 2024

SunMarc left a comment

HuggingFaceDocBuilderDev commented Sep 30, 2024

muellerzr left a comment

only move model to device when model is in cpu and target device is xpu #3133

only move model to device when model is in cpu and target device is xpu #3133

Conversation

faaany commented Sep 29, 2024

What does this PR do?

Who can review?

faaany commented Sep 30, 2024

yao-matrix commented Sep 30, 2024

SunMarc left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 30, 2024

muellerzr left a comment

Choose a reason for hiding this comment