-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable_diffusion_unet out of memory on XPU #505
Comments
This model is "pass_due_to_skip" in cuda due to too large size. We need to make sure it has the same behaviour on xpu with fp32. |
@retonym @mengfei25 may I know the latest status of this model on XPU? |
In the latest weekly test, Stable_diffusion_unet model is not loaded during env issue.
|
fixed |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
🐛 Describe the bug
torchbench_float32_training
xpu train stable_diffusion_unet
Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2294, in validate_model
self.model_iter_fn(model, example_inputs)
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 456, in forward_and_backward_pass
pred = mod(*cloned_inputs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 985, in forward
sample = upsample_block(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 2187, in forward
hidden_states = attn(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 309, in forward
hidden_states = block(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/diffusers/models/attention.py", line 194, in forward
attn_output = self.attn1(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 322, in forward
return self.processor(
File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1117, in call
hidden_states = F.scaled_dot_product_attention(
RuntimeError: XPU out of memory, please use
empty_cache
to release all unoccupied cached memory.The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4177, in run
) = runner.load_model(
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 380, in load_model
self.validate_model(model, example_inputs)
File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2296, in validate_model
raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed
eager_fail_to_run
loading model: 0it [00:00, ?it/s]
loading model: 0it [01:21, ?it/s]
Versions
torch-xpu-ops: 31c4001
pytorch: 0f81473d7b4a1bf09246410712df22541be7caf3 + PRs: 127277,129120
device: PVC 1100, 803.61, 0.5.1
The text was updated successfully, but these errors were encountered: