Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? #1793

Closed
danielaixer opened this issue Dec 23, 2023 · 1 comment

Comments

@danielaixer
Copy link

danielaixer commented Dec 23, 2023

I'm on Ubuntu 22.04, with 7900XTX GPU, ROCm5.6 and Mesa drivers. I can generate images using GPU via stable-diffusion-webui.

I have installed koyha_ss with these commands:

git clone https://github.com/bmaltais/kohya_ss.git 
cd kohya_ss
python -m venv venv
source venv/bin/activate
pip install --use-pep517 --upgrade -r requirements.txt
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.6
accelerate config

And I start the GUI with:

export HSA_OVERRIDE_GFX_VERSION=11.0.0
source venv/bin/activate
python kohya_gui.py --server_port 7863 --listen 0.0.0.0

I'm trying to train a LoRA model using the optimizer AdamW and with CrossAttention set to none. These parameters help me avoid bitandbytes and xFormers errors, but just when it seems it's working and getting to the optimization steps I get this error:

  File "/home/username/kohya_ss/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

And at the end of the terminal this:
subprocess.CalledProcessError: Command '['/home/username/kohya_ss/kohya_ss/venv/bin/python', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/home/username/kohya_ss/kohya_ss/datasets/Something', '--resolution=512,512', '--output_dir=/home/username/kohya_ss/kohya_ss/models/Lora/Custom', '--network_alpha=48', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=96', '--output_name=Something2', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=20', '--train_batch_size=4', '--max_train_steps=200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--save_every_n_steps=500', '--bucket_no_upscale', '--noise_offset=0.0', '--sample_sampler=euler', '--sample_prompts=/home/username/kohya_ss/kohya_ss/models/Lora/Custom/sample/prompt.txt', '--sample_every_n_steps=25']' returned non-zero exit status 1.

Based on similar errors mentioning 'Half', I'm pretty sure we need que equivalent of using --precision full --no-half when launching AUTOMATIC1111/stable-diffusion-webui.

The method shown here doesn't improve the situation for me: #1484
Including installing PyTorch ROCm5.7: pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7

Edit: When running "accelerate config", choosing "no" for the question "Do you wish to use FP16 or BF16 (mixed precision)?" didn't help.

Edit: Setting "Mixed precision" to "no" seems to be working, I will update one I confirm I can do a complete LoRA training.

@danielaixer danielaixer changed the title RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? Dec 23, 2023
@danielaixer
Copy link
Author

danielaixer commented Dec 26, 2023

Okay, confirmed, "Mixed precision" set to "no" works. Regarding "accelerate config", I think it doesn't really matter which mixed precision you choose.

Also, do NOT use AdamW8bit as optimizer (bitandbytes issue), use AdamW instead, and set "CrossAttention" to "none" (xFormers issue).

However, I still can't generate sample images nor captions with kohya_ss, but those issues are secondary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant