Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while running SD1.5 on multiple less beefy GPUs #7682

Closed
square-1111 opened this issue Apr 15, 2024 · 18 comments
Closed

Issue while running SD1.5 on multiple less beefy GPUs #7682

square-1111 opened this issue Apr 15, 2024 · 18 comments
Labels
bug Something isn't working

Comments

@square-1111
Copy link

square-1111 commented Apr 15, 2024

Describe the bug

I am trying to run distributed inference for SD1.5 and SDXL on 2xGTX 1080 Ti. But facing some issues

Reproduction


from diffusers import DiffusionPipeline
import torch



pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True, device_map="balanced", cache_dir="/data2/humanaware/tezuesh/Diffusion/cache_dir/")


print(pipeline.hf_device_map)

prompt = "A majestic lion jumping from a big stone at night"
image = pipeline(prompt)

command to run : CUDA_VISIBLE_DEVICES="0,1" python sd15_inference.py

Logs

Logs for SD1.5 inference 

Traceback (most recent call last):
  File "Diffusion/Inference/sd15_inference.py", line 35, in <module>
    pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True, device_map="balanced", cache_dir="/Diffusion/cache_dir/")
  File "/venv/py310/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/venv/py310/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 877, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "/venv/py310/lib/python3.10/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 699, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/venv/py310/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/venv/py310/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 694, in from_pretrained
    accelerate.load_checkpoint_and_dispatch(
  File "/venv/py310/lib/python3.10/site-packages/accelerate/big_modeling.py", line 614, in load_checkpoint_and_dispatch
    return dispatch_model(
  File "/venv/py310/lib/python3.10/site-packages/accelerate/big_modeling.py", line 419, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "/venv/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 608, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "/venv/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 157, in add_hook_to_module
    module = hook.init_hook(module)
  File "/venv/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 275, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device, tied_params_map=self.tied_params_map)
  File "/venv/py310/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 354, in set_module_tensor_to_device
    raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
ValueError: weight is on the meta device, we need a `value` to put in on 0.

System Info

diffusers-cli env

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • diffusers version: 0.28.0.dev0
  • Platform: Linux-4.15.0-140-generic-x86_64-with-glibc2.31
  • Python version: 3.10.14
  • PyTorch version (GPU?): 2.2.2+cu121 (True)
  • Huggingface_hub version: 0.22.2
  • Transformers version: 4.40.0.dev0
  • Accelerate version: 0.29.0
  • xFormers version: not installed
  • Using GPU in script?: 2
  • Using distributed or parallel set-up in script?:

Who can help?

@sayakpaul

@square-1111 square-1111 added the bug Something isn't working label Apr 15, 2024
@anujkum25
Copy link

did you try setting something like - max_memory = {0:"1GB", 1:"1GB"} ?

@square-1111
Copy link
Author

memory

Yes I tried this as well! Same error!

@sayakpaul
Copy link
Member

Cc: @SunMarc

@sayakpaul
Copy link
Member

Could you also share the output of nvidia-smi?

@SunMarc
Copy link
Member

SunMarc commented Apr 16, 2024

Hi @square-1111, the issue is due to a breaking changes in the latest accelerate. If you use accelerate=0.27.0, it should work fine. We will do a patch release to enable diffusers to fix this issue. See related issue.

@square-1111
Copy link
Author

Could you also share the output of nvidia-smi?

Wed Apr 17 10:45:32 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:04:00.0 Off | N/A |
| 23% 35C P2 54W / 250W | 3054MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:85:00.0 Off | N/A |
| 23% 22C P8 8W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 5709 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 5709 G /usr/lib/xorg/Xorg 3MiB |
+-----------------------------------------------------------------------------+

@square-1111
Copy link
Author

Hi @square-1111, the issue is due to a breaking changes in the latest accelerate. If you use accelerate=0.27.0, it should work fine. We will do a patch release to enable diffusers to fix this issue. See related issue.

I tried this. Got this issue

NotImplementedError: Device placement requires `accelerate` version `0.28.0` or later.

@sayakpaul
Copy link
Member

What did you do?

@square-1111
Copy link
Author

Installed accelerate=0.27.0

@sayakpaul
Copy link
Member

Can you try uninstalling accelerate and installing it from the source?

pip install git+https://github.com/huggingface/accelerate

@square-1111
Copy link
Author

Still the same issue!

@square-1111
Copy link
Author

ValueError: weight is on the meta device, we need a `value` to put in on 0.

@sayakpaul
Copy link
Member

I will defer to @SunMarc to comment further then.

@SunMarc
Copy link
Member

SunMarc commented Apr 17, 2024

Hi @square-1111 , can you try installing accelerate=0.28.0 instead ? I didn't get the error because I check out directly to the commit tagged v0.27

@square-1111
Copy link
Author

Hi @SunMarc it works perfectly with accelerate==0.28.0. I tried to infer SDXL on 4GPUs of similar config and it prompts CUDA OUT OF MEMORY error.

@SunMarc
Copy link
Member

SunMarc commented Apr 18, 2024

Hi @square-1111, we can't guarantee that during inference, you won't get OOM error. You can use max_memory arg to make sure that you have enough space to infer.

@square-1111
Copy link
Author

I tried this but inference with this is slower than CPU inference. xD

@sayakpaul
Copy link
Member

That is expected as it involves data movement across devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants