Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") #1106

Open
reinka opened this issue Sep 6, 2020 · 35 comments

Comments

@reinka
Copy link

reinka commented Sep 6, 2020

GPU: 5700xt

When using the following Docker image:

rocm/tensorflow     latest              d83f8c9d5c96        2 weeks ago         10.3GB

with ROCm installed on the Docker host as explained here: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

I get the following error when executing TensorFlow ops:

root@apoehlmann:/root# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-06 20:14:03.889728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

and the Python console dies. I started the container with the alias mentioned in the corresponding Docker registry: https://hub.docker.com/r/rocm/tensorflow

I get the same error when I try to run tensorflow ops on the host.

Googling this issue yields only a handful of results so I feel like I might have some misconfiguration but I cannot figure out what it is.

@xuhuisheng
Copy link

I test rocm-3.7.0 on ubuntu-20.04, my gpu is gfx803.
Tensorflow-rocm loaded /opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco and /opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co.
5700xt related gfx1010, so maybe there are missing some library for it.

@reinka
Copy link
Author

reinka commented Sep 7, 2020

Hmm, I'm afraid I don't understand enough to know how to use your information :/

@oleid
Copy link

oleid commented Sep 8, 2020

Same problem, different GPU and not in docker, but ArchLinux.

Python 3.8.5 (default, Sep  5 2020, 10:50:12) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-08 15:28:57.302760: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-08 15:28:57.345180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: -1B/s
2020-09-08 15:28:57.417068: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-08 15:28:57.418638: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-08 15:28:57.425913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
/home/oleid/.cache/rua/build/hip-rocclr/src/HIP-rocm-3.7.0/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

@xuhuisheng: How did you get the list of files tensorflow-rocm loaded? I tried strace-ing my python script -- to no avail.

It would seem I don't have /opt/rocm/rocblas/lib/library/, possible that's the problem.

$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/librocblas.so.0.1

@oleid
Copy link

oleid commented Sep 8, 2020

GPU: 5700xt

When using the following Docker image:

[..]

@reinka:

I find it strange that your python output doesn't list a device. Does rocminfo or clinfo list anything?

By the way, when I experimented with tensorflow in docker, I used something like:

sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --volume $PWD:/data rocm/tensorflow

@xuhuisheng
Copy link

xuhuisheng commented Sep 8, 2020

I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory.
The rocBLAS didnot support gfx1010 tensile image,

The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.

dpkg -c rocblas_2.26.0.2565-9d981389_amd64.deb

drwxr-xr-x root/root         0 2020-08-18 09:08 ./opt/rocm-3.7.0/rocblas/lib/library/
-rw-r--r-- root/root  15337680 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
-rw-r--r-- root/root  14182000 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
-rw-r--r-- root/root  14905424 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
-rw-r--r-- root/root  14989608 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
-rw-r--r-- root/root  13846184 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
-rw-r--r-- root/root  14116520 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
-rw-r--r-- root/root 108018750 2020-08-18 09:00 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary.yaml
-rw-r--r-- root/root   3678448 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx803.co
-rw-r--r-- root/root  35668608 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx900.co
-rw-r--r-- root/root  97234680 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx906.co
-rw-r--r-- root/root 110233032 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx908.co

@oleid
Copy link

oleid commented Sep 9, 2020

Okay, I now have those files as well. That pull rocm-arch/rocm-arch#413 fixed it.

find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/librocblas.so.0.1

Problem still persists, though.

@oleid
Copy link

oleid commented Sep 9, 2020

I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory.
The rocBLAS didnot support gfx1010 tensile image,

The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.

Please note that in the aforementioned docker container tensorflow-rocm seems to find all it needs. So this must be something ArchLinux related in my case.

root@0f19f0974f40:/data# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 12:05:54.542100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-09 12:05:54.582874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.585567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.586959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.595182: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.595500: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.595671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.605093: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3851195000 Hz
2020-09-09 12:05:54.605820: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56782fce80 initialized for platform Host (this does not guarantee that XLA 
2020-09-09 12:05:54.605855: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-09 12:05:54.608314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56781688f0 initialized for platform ROCM (this does not guarantee that XLA 
2020-09-09 12:05:54.608348: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Ellesmere [Radeon RX 470/480/570/570X/580/580X], AMDGPU ISA ve
2020-09-09 12:05:54.916198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.916264: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.916280: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.916294: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.916308: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.916412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.916438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-09 12:05:54.916448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-09 12:05:54.916455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-09 12:05:54.916606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3796 MB
 0000:08:00.0)
<tf.Tensor: shape=(), dtype=int32, numpy=3>

@oleid
Copy link

oleid commented Sep 9, 2020

It would seem librocrand is to blame on Arch. It is missing support for my GPU. I hacked in debug info as well and a dump of the call stack:

2020-09-09 14:37:30.875746: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
isCompatibleCodeObject: gfx803 == gfx900?
isCompatibleCodeObject: gfx803 == gfx906?
isCompatibleCodeObject: gfx803 == gfx908?
Call stack:
/opt/rocm/hip/lib/libamdhip64.so.3(+0x7eaf8)[0x7f8237487af8]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x8032e)[0x7f823748932e]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x805a4)[0x7f82374895a4]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x80929)[0x7f8237489929]
/opt/rocm/rocrand/lib/librocrand.so(+0xdcbd)[0x7f82001a6cbd]

Will report back once I know more.

@oleid
Copy link

oleid commented Sep 9, 2020

Yes, that did the trick. Works for me now, thanks :)

@tpkessler
Copy link

Hey @oleid to which trick are you referring to? I've submitted a PR to rocm-arch which adds gfx803 as a target architecture, see rocm-arch/rocm-arch#414

@reinka
Copy link
Author

reinka commented Sep 9, 2020

@oleid Hm, I think you are onto something. I used both the official docker run command and your version and inside the container I get the following rocminfo output:

root@5419cfc6178e:/root# rocminfo 
sh: 1: lsmod: not found
ROCk module is NOT loaded, possibly no GPU devices
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Device 731f                        
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2080                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

whereas on my host (Ubunut 20.04) it seem to work properly:

$ rocminfo 
ROCk module is loaded
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2080                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done *** 

However, on my host I still get the same issue when I try to run tensorflow operations:

apoehlmann@apoehlmann:~$ . .envs/mypy3/bin/activate
(mypy3) apoehlmann@apoehlmann:~$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 18:55:30.801592: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

TF version:

(mypy3) apoehlmann@apoehlmann:~$ pip freeze | grep tensor
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.3.0
tensorflow-rocm==2.3.0

EDIT

I also ran the following on host & inside container, got the same output:

(mypy3) apoehlmann@apoehlmann:~$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/lib/librocblas.so.0.1.30700
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp

@xuhuisheng
Copy link

sudo apt install kmod can solve the lsmod warning in docker.

And I cannot find how to generate the Tensile image for gfx1010 under rocBLAS. Maybe you could recompile rocBLAS with BUILD_TENSILE_HOST=false. It will skip the Tensile image.

Actually the rocm didnot support gfx1010(nav10) offcially, so I cannot guarentee we could run gfx1010 on ROCm, eventually, please refer these issues:

ROCm/pytorch#718
ROCm/ROCm#887

@reinka
Copy link
Author

reinka commented Sep 10, 2020

@xuhuisheng I solved the lsmod problem however the issue still remained.

Thanks for the hint and links. I will look into it. Before I started to get TF running with the 5700xt I found some other github issue where they linked to this blog post

https://www.preining.info/blog/2020/05/switching-from-nvidia-to-amd-including-tensorflow/

and confirmed it would work. So it seems some people get it running with the 5700xt. I already tried to reproduce the steps there but I wasn't successful.

Also tried this approach here ROCm/ROCm#887 (comment) and wasn't able to reproduce it either.

@xuhuisheng
Copy link

@reinka I am afraid we had read this blog already, unfortrunately, the auther claimed that he met a segment fault later in the comment.

@o8ruza8o
Copy link

o8ruza8o commented Oct 2, 2020

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

@oleid
Copy link

oleid commented Oct 3, 2020

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released).

@o8ruza8o
Copy link

o8ruza8o commented Oct 3, 2020 via email

@xuhuisheng
Copy link

xuhuisheng commented Oct 3, 2020

@o8ruza8o which version of rocm do you use?By rigtorps reseaching, need rocm-3.7 to support gfx10xx.

gfx1012 is more complex, tensile only support gfx1010 and gfx1011, you may have to copy related Kernel.koso too.

And I had two ideas for it.
first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co
second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE
please refer this issue ROCm/pytorch#718 (comment)

@o8ruza8o
Copy link

o8ruza8o commented Oct 7, 2020 via email

@km1993
Copy link

km1993 commented Oct 9, 2020

I have 5700xt I tried every possible method mentioned to get over this issue, nothing helped.
_```

import tensorflow as tf
tf.add(1,2)
2020-10-09 00:05:00.599858: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)


@xuhuisheng
Copy link

There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November.
https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10

@da-phil
Copy link

da-phil commented Nov 18, 2020

There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November.
https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10

I'm curious whether the gfx10 branch also covers chipsets other than gfx1030, because it seems that only gfx1030 has been added, see:
ROCm/rocBLAS@8cd7bf0

And also in other rocm packages, e.g.:
ROCm/rccl@9f20b00

@xuhuisheng
Copy link

@da-phil
So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0.
Only hope the patch for RDNA2 can use to RDNA1 without big modifications.

@da-phil
Copy link

da-phil commented Nov 18, 2020

@da-phil
So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0.
Only hope the patch for RDNA2 can use to RDNA1 without big modifications.

I wonder why the new RDNA2 is even categorized within gfx10, there must be some similarities in the way they work 🤔

Off-topic question: do you or anybody else know any other recent AMD radeon GPU other than gfx803, gfx900, gfx906 and gfx908 which proved to work well with rocm and therefore tensorflow & pytorch?
If that's the case I'd replace my new RX 5700XT by another AMD GPU right away. I like AMDs new open-source policy and don't want to go back to nvidia...

@iamsanjaymalakar
Copy link

import tensorflow as tf
x = tf.variable(2)
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'tensorflow' has no attribute 'variable'
x = tf.Variable(2)
2020-11-20 13:14:26.164093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

I am also having the same problem.
Ubuntu 20.04 RX590 rocm3.9

Has anyone find any solution?

@xuhuisheng
Copy link

@iamsanjaymalakar please see this issue ROCm/ROCm#1269

@iamsanjaymalakar
Copy link

@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269

I am not sure I understood the solution correctly.
I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists.
I think i may be missing something.

@xuhuisheng
Copy link

@iamsanjaymalakar
I wrote a doc for gfx803 issues. https://github.com/xuhuisheng/rocm-build/blob/develop/docs/gfx803.md

@Doev
Copy link

Doev commented Nov 28, 2020

I am currently at the same point.

Ubuntu 18.04
RX 5500 XT

No idea, how to use the workaround.

@xuhuisheng
Copy link

@Doev
RX 55000 XT didnot supported offcially. ROCm/ROCm#1306

@krishoza
Copy link

krishoza commented Dec 7, 2020

@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269

I am not sure I understood the solution correctly.
I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists.
I think i may be missing something.

I am getting the similar error. I have checked the AMDGPU_TARGETS for same library i.e. rocSPARSE and it correctly mentions the GPU I have which is gfx906.

@jerryyin
Copy link
Member

jerryyin commented Feb 1, 2021

navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.

@RobertKillick
Copy link

navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.

Is there any idea how long it will take for support to come?

@jerryyin
Copy link
Member

jerryyin commented Feb 9, 2021

@RobertKillick That would be a question to ROCm guys. Once they have the infrastructure ready, it is trivial to add TF support for it.

@peterdfields
Copy link

peterdfields commented Aug 5, 2021

Has anyone had any luck getting tensorflow-rocm running on a gfx1030 device?

UPDATE: I was able to get things running on a gfx1030 device building tf from source, I couldn't get available binaries to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests