Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Segmentation fault (core dumped) with docker pytorch and gfx803 #14856

Open
6 tasks done
picarica opened this issue Feb 7, 2024 · 2 comments
Open
6 tasks done
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@picarica
Copy link

picarica commented Feb 7, 2024

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

i heard gfx803 should work with docker amd images i followed these instructions https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux
and i fixed all the errors along the installation but i got unknown segmentation core error

Steps to reproduce the problem

  1. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux
    follow those instructions
    different only is i tried launching with .webui.sh but added those commandline argumentus into webui-user.sh
  2. export HSA_OVERRIDE_GFX_VERSION=10.3.0
  3. had to fix this error and use this fix [Bug]: ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed' #11458 (comment)
  4. segmentation faul

What should have happened?

it should have launched i had everything all drivers working, when i type rocminfo i get gfx803 output inside and outside the docker image
rocminfo | grep gfx Name: gfx803 Name: amdgcn-amd-amdhsa--gfx803

What browsers do you use to access the UI ?

No response

Sysinfo

host is Gentoo and using ryzen 5600X with RX570 GPU and 16GB of ram

Console logs

had this 
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.



Stable diffusion model failed to load
Applying attention optimization: Doggettx... done.

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx803
 List of available TensileLibrary Files : 
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
Aborted (core dumped)
but i fixed it with export HSA_OVERRIDE_GFX_VERSION=10.3.0

after that 

REQS_FILE='requirements.txt' python launch.py --precision full --no-half --skip-torch-cuda-test
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
Segmentation fault (core dumped)
root@gentoo:/dockerx/stable-diffusion-webui# REQS_FILE='requirements.txt' python launch.py --precision full --no-half --skip-torch-cuda-test
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
Segmentation fault (core dumped)

Additional information

No response

@picarica picarica added the bug-report Report of a bug, yet to be confirmed label Feb 7, 2024
@victorsl3
Copy link

Ive got the same issue in a different context

@DGdev91
Copy link
Contributor

DGdev91 commented Feb 15, 2024

HSA_OVERRIDE_GFX_VERSION=10.3.0 works well for Navi (rx 5000 and 6000 series), but your gpu is older.
Try with HSA_OVERRIDE_GFX_VERSION=9.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

3 participants