Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run tensorflow on Ubuntu 20.10 with RX580 #1291

Closed
staticdev opened this issue Mar 16, 2021 · 4 comments
Closed

Can't run tensorflow on Ubuntu 20.10 with RX580 #1291

staticdev opened this issue Mar 16, 2021 · 4 comments

Comments

@staticdev
Copy link

staticdev commented Mar 16, 2021

Similar to (but not the same): #1106

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.10 with 5.10 LTS Kernel
  • TensorFlow installed from (source or binary): pip/binary
  • TensorFlow version (use command below): 2.3.2 (tensorflow-rocm==2.3.4)
  • Python version: 3.8.6
  • GPU model and memory: Radeon RX 580

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:
TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

v2.3.0-2335-g6e65a385034 2.3.4
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

Interestingly enough I get the error only when I output python console:

python
import tensorflow as tf
>>> tf.version.GIT_VERSION
'v2.3.0-2335-g6e65a385034'
>>> tf.version.VERSION
'2.3.4'
>>> 
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

Describe the current behavior
When I run tensorflow I always get:
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

Describe the expected behavior
Run.

Standalone code to reproduce the issue
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Other info / logs Include any logs or source code that would be helpful to

@sunway513
Copy link

Hi @staticdev , RX580 (gfx803) is not supported by ROCm stack.
If you want to try it out with ROCm, I'd suggest you to install the rock-dkms kernel driver from the following ROCm3.5 release:
http://repo.radeon.com/rocm/apt/3.5/
And go with the following ROCm3.5 based docker images:
https://hub.docker.com/r/rocm/tensorflow/tags?page=1&ordering=last_updated&name=rocm3.5

@staticdev
Copy link
Author

staticdev commented Apr 12, 2021

@sunway513 I tried 3.5.1, 3.9, 3.10 and 4.0. Multiple versions of Ubuntu and it did not work. RX580 is on higher end of Radeon boards and it is only 2 generations old, this is real a pity! NVidia's CUDA simply works with older generations AND officially supports it.

@sunway513
Copy link

Hi @staticdev , I was able to use ROCm 3.5 tensorflow ROCm public docker image to run workloads on my Radeon RX 580 using my threadripper 3960X work station, using Ubuntu18.04 system.
More details on my system configuration and steps:

  1. kernel driver - use the stock upstream 5.4.0 kernel driver without any ROCm kernel driver modules, steps:
# optional - purge any rock-dkms packages installed on your system
sudo apt autoremove -y rock-dkms 

# Install the upstream 5.4.0 kernel driver, which has gfx803 firmware support built-in
sudo apt-get install --install-recommends linux-generic-hwe-18.04

# reboot your systems to switch to 5.4.0 kernel driver
sudo reboot

# after reboot confirm if you kernel driver is using 5.4.0, here's my log:
~$ uname -a
5.4.0-71-generic #79~18.04.1-Ubuntu SMP Thu Mar 25 05:45:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  1. docker container - use the ROCm3.5 docker container we published on Dockerhub.com
# install docker support if not already, more details https://github.com/RadeonOpenCompute/ROCm-docker/blob/master/quick-start.md
curl -sSL https://get.docker.com/ | sh

# pull and launch the ROCm3.5 docker images
alias drun=' sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx -v /data:/data --shm-size=16G'

drun rocm/tensorflow:rocm3.5-tf2.1-dev  
  1. confirm GPU is recognized and run some benchmarks
# check rocm-smi and confirm if GPU is shown up
# /opt/rocm/bin/rocm-smi
========================ROCm System Management Interface========================
================================================================================
GPU  Temp   AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%
0    38.0c  7.165W  300Mhz  300Mhz  23.92%  auto  110.0W    2%   0%
================================================================================
==============================End of ROCm SMI Log ==============================

# execute tf_cnn_benchmark
cd ~/benchmarks/scripts/tf_cnn_benchmarks
python3 tf_cnn_benchmarks.py
...
2021-04-12 17:23:00.572668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7478 MB memory) -> physical GPU (device: 0, name: E
llesmere [Radeon RX 470/480/570/570X/580/580X], pci bus id: 0000:01:00.0)                                                                   
TensorFlow:  2.1                                                                                                                            
Model:       trivial   
...
100     images/sec: 6725.5 +/- 27.4 (jitter = 341.4)    14.298
----------------------------------------------------------------
total images/sec: 6684.21
----------------------------------------------------------------

@xuhuisheng
Copy link

The gfx803 with ROCm-3.5.1 can run properly. Only need fix the LD_LIBRARY_PATH for /opt/rocm/lib.
And I wrote a documents for how to re-compile ROCm-4.1 components for gfx803.
https://github.com/xuhuisheng/rocm-build/tree/master/gfx803

If just want to fix the hipErrorNoBinaryForGpu, we only need re-compile rocRAND with AMDGPU_TARGETS=gfx803.
if want to run GEMM properly, we need re-compile rocBLAS with patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants