hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") #1106

reinka · 2020-09-06T20:29:11Z

GPU: 5700xt

When using the following Docker image:

rocm/tensorflow     latest              d83f8c9d5c96        2 weeks ago         10.3GB

with ROCm installed on the Docker host as explained here: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

I get the following error when executing TensorFlow ops:

root@apoehlmann:/root# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-06 20:14:03.889728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

and the Python console dies. I started the container with the alias mentioned in the corresponding Docker registry: https://hub.docker.com/r/rocm/tensorflow

I get the same error when I try to run tensorflow ops on the host.

Googling this issue yields only a handful of results so I feel like I might have some misconfiguration but I cannot figure out what it is.

The text was updated successfully, but these errors were encountered:

xuhuisheng · 2020-09-07T08:05:56Z

I test rocm-3.7.0 on ubuntu-20.04, my gpu is gfx803.
Tensorflow-rocm loaded /opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco and /opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co.
5700xt related gfx1010, so maybe there are missing some library for it.

reinka · 2020-09-07T18:50:31Z

Hmm, I'm afraid I don't understand enough to know how to use your information :/

oleid · 2020-09-08T13:36:04Z

Same problem, different GPU and not in docker, but ArchLinux.

Python 3.8.5 (default, Sep  5 2020, 10:50:12) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-08 15:28:57.302760: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-08 15:28:57.345180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: -1B/s
2020-09-08 15:28:57.417068: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-08 15:28:57.418638: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-08 15:28:57.425913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
/home/oleid/.cache/rua/build/hip-rocclr/src/HIP-rocm-3.7.0/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

@xuhuisheng: How did you get the list of files tensorflow-rocm loaded? I tried strace-ing my python script -- to no avail.

It would seem I don't have /opt/rocm/rocblas/lib/library/, possible that's the problem.

$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/librocblas.so.0.1

oleid · 2020-09-08T13:42:45Z

GPU: 5700xt

When using the following Docker image:

[..]

@reinka:

I find it strange that your python output doesn't list a device. Does rocminfo or clinfo list anything?

By the way, when I experimented with tensorflow in docker, I used something like:

sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --volume $PWD:/data rocm/tensorflow

xuhuisheng · 2020-09-08T14:34:38Z

I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory.
The rocBLAS didnot support gfx1010 tensile image,

The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.

dpkg -c rocblas_2.26.0.2565-9d981389_amd64.deb

drwxr-xr-x root/root         0 2020-08-18 09:08 ./opt/rocm-3.7.0/rocblas/lib/library/
-rw-r--r-- root/root  15337680 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
-rw-r--r-- root/root  14182000 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
-rw-r--r-- root/root  14905424 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
-rw-r--r-- root/root  14989608 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
-rw-r--r-- root/root  13846184 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
-rw-r--r-- root/root  14116520 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
-rw-r--r-- root/root 108018750 2020-08-18 09:00 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary.yaml
-rw-r--r-- root/root   3678448 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx803.co
-rw-r--r-- root/root  35668608 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx900.co
-rw-r--r-- root/root  97234680 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx906.co
-rw-r--r-- root/root 110233032 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx908.co

oleid · 2020-09-09T11:48:50Z

Okay, I now have those files as well. That pull rocm-arch/rocm-arch#413 fixed it.

find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/librocblas.so.0.1

Problem still persists, though.

oleid · 2020-09-09T12:07:24Z

I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory.
The rocBLAS didnot support gfx1010 tensile image,

The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.

Please note that in the aforementioned docker container tensorflow-rocm seems to find all it needs. So this must be something ArchLinux related in my case.

root@0f19f0974f40:/data# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 12:05:54.542100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-09 12:05:54.582874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.585567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.586959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.595182: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.595500: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.595671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.605093: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3851195000 Hz
2020-09-09 12:05:54.605820: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56782fce80 initialized for platform Host (this does not guarantee that XLA 
2020-09-09 12:05:54.605855: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-09 12:05:54.608314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56781688f0 initialized for platform ROCM (this does not guarantee that XLA 
2020-09-09 12:05:54.608348: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Ellesmere [Radeon RX 470/480/570/570X/580/580X], AMDGPU ISA ve
2020-09-09 12:05:54.916198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.916264: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.916280: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.916294: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.916308: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.916412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.916438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-09 12:05:54.916448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-09 12:05:54.916455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-09 12:05:54.916606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3796 MB
 0000:08:00.0)
<tf.Tensor: shape=(), dtype=int32, numpy=3>

oleid · 2020-09-09T12:40:14Z

It would seem librocrand is to blame on Arch. It is missing support for my GPU. I hacked in debug info as well and a dump of the call stack:

2020-09-09 14:37:30.875746: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
isCompatibleCodeObject: gfx803 == gfx900?
isCompatibleCodeObject: gfx803 == gfx906?
isCompatibleCodeObject: gfx803 == gfx908?
Call stack:
/opt/rocm/hip/lib/libamdhip64.so.3(+0x7eaf8)[0x7f8237487af8]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x8032e)[0x7f823748932e]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x805a4)[0x7f82374895a4]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x80929)[0x7f8237489929]
/opt/rocm/rocrand/lib/librocrand.so(+0xdcbd)[0x7f82001a6cbd]

Will report back once I know more.

oleid · 2020-09-09T12:54:43Z

Yes, that did the trick. Works for me now, thanks :)

tpkessler · 2020-09-09T13:04:01Z

Hey @oleid to which trick are you referring to? I've submitted a PR to rocm-arch which adds gfx803 as a target architecture, see rocm-arch/rocm-arch#414

reinka · 2020-09-09T16:58:58Z

@oleid Hm, I think you are onto something. I used both the official docker run command and your version and inside the container I get the following rocminfo output:

root@5419cfc6178e:/root# rocminfo 
sh: 1: lsmod: not found
ROCk module is NOT loaded, possibly no GPU devices
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Device 731f                        
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2080                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

whereas on my host (Ubunut 20.04) it seem to work properly:

$ rocminfo 
ROCk module is loaded
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2080                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

However, on my host I still get the same issue when I try to run tensorflow operations:

apoehlmann@apoehlmann:~$ . .envs/mypy3/bin/activate
(mypy3) apoehlmann@apoehlmann:~$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 18:55:30.801592: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

TF version:

(mypy3) apoehlmann@apoehlmann:~$ pip freeze | grep tensor
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.3.0
tensorflow-rocm==2.3.0

EDIT

I also ran the following on host & inside container, got the same output:

(mypy3) apoehlmann@apoehlmann:~$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/lib/librocblas.so.0.1.30700
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp

xuhuisheng · 2020-09-09T22:55:56Z

sudo apt install kmod can solve the lsmod warning in docker.

And I cannot find how to generate the Tensile image for gfx1010 under rocBLAS. Maybe you could recompile rocBLAS with BUILD_TENSILE_HOST=false. It will skip the Tensile image.

Actually the rocm didnot support gfx1010(nav10) offcially, so I cannot guarentee we could run gfx1010 on ROCm, eventually, please refer these issues:

ROCm/pytorch#718
ROCm/ROCm#887

reinka · 2020-09-10T05:43:21Z

@xuhuisheng I solved the lsmod problem however the issue still remained.

Thanks for the hint and links. I will look into it. Before I started to get TF running with the 5700xt I found some other github issue where they linked to this blog post

https://www.preining.info/blog/2020/05/switching-from-nvidia-to-amd-including-tensorflow/

and confirmed it would work. So it seems some people get it running with the 5700xt. I already tried to reproduce the steps there but I wasn't successful.

Also tried this approach here ROCm/ROCm#887 (comment) and wasn't able to reproduce it either.

xuhuisheng · 2020-09-10T10:06:56Z

@reinka I am afraid we had read this blog already, unfortrunately, the auther claimed that he met a segment fault later in the comment.

o8ruza8o · 2020-10-02T20:30:23Z

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

oleid · 2020-10-03T06:56:45Z

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released).

o8ruza8o · 2020-10-03T19:11:54Z

I would appreciate a flag that allows me to use what works even if not everything and not tested instead of not being able to do anything at all on new GPUs.

…

On Fri, Oct 2, 2020 at 11:56 PM oleid ***@***.***> wrote: Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs? It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1106 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMPIAOOOFXVRPFMCZPABIDSI3DLRANCNFSM4Q434RNQ> .

xuhuisheng · 2020-10-03T23:21:05Z

@o8ruza8o which version of rocm do you use?By rigtorps reseaching, need rocm-3.7 to support gfx10xx.

gfx1012 is more complex, tensile only support gfx1010 and gfx1011, you may have to copy related Kernel.koso too.

And I had two ideas for it.
first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co
second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE
please refer this issue ROCm/pytorch#718 (comment)

o8ruza8o · 2020-10-07T20:53:57Z

I am running rocm 3.8.0. My kernel is 5.7.19. My GPU is gfx1012.

…

On Sat, Oct 3, 2020 at 4:21 PM Xu Huisheng ***@***.***> wrote: @o8ruza8o <https://github.com/o8ruza8o> which version of rocm do you use? since rigtorp reseaching, need rocm-3.7 to support gfx10xx. And Ihad two ideas for it. first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE please refer this issue — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1106 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMPIAJ4FNPWX5CA5NZ3CQ3SI6WW5ANCNFSM4Q434RNQ> .

km1993 · 2020-10-09T00:24:14Z

I have 5700xt I tried every possible method mentioned to get over this issue, nothing helped.
_```

import tensorflow as tf
tf.add(1,2)
2020-10-09 00:05:00.599858: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

xuhuisheng · 2020-10-09T22:47:08Z

There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November.
https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10

da-phil · 2020-11-18T21:28:21Z

There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November.
https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10

I'm curious whether the gfx10 branch also covers chipsets other than gfx1030, because it seems that only gfx1030 has been added, see:
ROCm/rocBLAS@8cd7bf0

And also in other rocm packages, e.g.:
ROCm/rccl@9f20b00

xuhuisheng · 2020-11-18T21:48:38Z

@da-phil
So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0.
Only hope the patch for RDNA2 can use to RDNA1 without big modifications.

da-phil · 2020-11-18T23:09:53Z

@da-phil
So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0.
Only hope the patch for RDNA2 can use to RDNA1 without big modifications.

I wonder why the new RDNA2 is even categorized within gfx10, there must be some similarities in the way they work 🤔

Off-topic question: do you or anybody else know any other recent AMD radeon GPU other than gfx803, gfx900, gfx906 and gfx908 which proved to work well with rocm and therefore tensorflow & pytorch?
If that's the case I'd replace my new RX 5700XT by another AMD GPU right away. I like AMDs new open-source policy and don't want to go back to nvidia...

iamsanjaymalakar · 2020-11-20T07:20:33Z

import tensorflow as tf
x = tf.variable(2)
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'tensorflow' has no attribute 'variable'
x = tf.Variable(2)
2020-11-20 13:14:26.164093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

I am also having the same problem.
Ubuntu 20.04 RX590 rocm3.9

Has anyone find any solution?

xuhuisheng · 2020-11-20T07:28:07Z

@iamsanjaymalakar please see this issue ROCm/ROCm#1269

iamsanjaymalakar · 2020-11-20T07:55:38Z

@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269

I am not sure I understood the solution correctly.
I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists.
I think i may be missing something.

xuhuisheng · 2020-11-20T23:10:02Z

@iamsanjaymalakar
I wrote a doc for gfx803 issues. https://github.com/xuhuisheng/rocm-build/blob/develop/docs/gfx803.md

Doev · 2020-11-28T19:05:23Z

I am currently at the same point.

Ubuntu 18.04
RX 5500 XT

No idea, how to use the workaround.

xuhuisheng · 2020-11-28T21:50:11Z

@Doev
RX 55000 XT didnot supported offcially. ROCm/ROCm#1306

krishoza · 2020-12-07T10:55:43Z

@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269

I am not sure I understood the solution correctly.
I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists.
I think i may be missing something.

I am getting the similar error. I have checked the AMDGPU_TARGETS for same library i.e. rocSPARSE and it correctly mentions the GPU I have which is gfx906.

jerryyin · 2021-02-01T22:30:06Z

navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.

RobertKillick · 2021-02-09T16:37:42Z

navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.

Is there any idea how long it will take for support to come?

jerryyin · 2021-02-09T17:01:20Z

@RobertKillick That would be a question to ROCm guys. Once they have the infrastructure ready, it is trivial to add TF support for it.

peterdfields · 2021-08-05T22:36:49Z

Has anyone had any luck getting tensorflow-rocm running on a gfx1030 device?

UPDATE: I was able to get things running on a gfx1030 device building tf from source, I couldn't get available binaries to run.

oleid mentioned this issue Sep 8, 2020

[rocblas] Missing data (kernels and TensileLibrary) break e.g. Tensorflow rocm-arch/rocm-arch#411

Closed

tpkessler mentioned this issue Sep 9, 2020

[rocrand] Add support for gfx803 rocm-arch/rocm-arch#414

Closed

oleid mentioned this issue Sep 10, 2020

call to miopenFindConvolutionBackwardDataAlgorithm failed #1110

Closed

xuhuisheng mentioned this issue Sep 18, 2020

Build fails for gfx1010 architecture ROCm/pytorch#718

Open

rigtorp mentioned this issue Sep 18, 2020

Add Navi GFX1010 support ROCm/Tensile#1165

Closed

This was referenced Oct 10, 2020

CMake Error at src/CMakeLists.txt:104 (rocm_set_soversion) ROCm/rocALUTION#116

Closed

TF.Keras Load Model predict low accuracy when using ROCm #1144

Open

xuhuisheng mentioned this issue Oct 23, 2020

NaN loss using Keras Sequential model and mse as loss metric ROCm/ROCm#1264

Closed

DarjanKrijan mentioned this issue Oct 29, 2020

Really bad performance with TensorFlow rocm-arch/rocm-arch#447

Closed

AveNoF mentioned this issue Nov 24, 2020

rocminfo can't find libhsa-runtime64.so.1 ROCm/ROCm#1302

Closed

staticdev mentioned this issue Mar 16, 2021

Can't run tensorflow on Ubuntu 20.10 with RX580 #1291

Closed

ppanchad-amd added the Under Investigation label Feb 13, 2025

hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") #1106

hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") #1106

Comments

reinka commented Sep 6, 2020 • edited Loading

xuhuisheng commented Sep 7, 2020

reinka commented Sep 7, 2020

oleid commented Sep 8, 2020

oleid commented Sep 8, 2020 • edited Loading

xuhuisheng commented Sep 8, 2020 • edited Loading

oleid commented Sep 9, 2020

oleid commented Sep 9, 2020 • edited Loading

oleid commented Sep 9, 2020

oleid commented Sep 9, 2020

tpkessler commented Sep 9, 2020

reinka commented Sep 9, 2020 • edited Loading

xuhuisheng commented Sep 9, 2020

reinka commented Sep 10, 2020 • edited Loading

xuhuisheng commented Sep 10, 2020

o8ruza8o commented Oct 2, 2020

oleid commented Oct 3, 2020

o8ruza8o commented Oct 3, 2020 via email

xuhuisheng commented Oct 3, 2020 • edited Loading

o8ruza8o commented Oct 7, 2020 via email

km1993 commented Oct 9, 2020

xuhuisheng commented Oct 9, 2020

da-phil commented Nov 18, 2020 • edited Loading

xuhuisheng commented Nov 18, 2020

da-phil commented Nov 18, 2020

iamsanjaymalakar commented Nov 20, 2020

xuhuisheng commented Nov 20, 2020

iamsanjaymalakar commented Nov 20, 2020

xuhuisheng commented Nov 20, 2020

Doev commented Nov 28, 2020

xuhuisheng commented Nov 28, 2020

krishoza commented Dec 7, 2020

jerryyin commented Feb 1, 2021

RobertKillick commented Feb 9, 2021

jerryyin commented Feb 9, 2021

peterdfields commented Aug 5, 2021 • edited Loading

reinka commented Sep 6, 2020 •

edited

Loading

oleid commented Sep 8, 2020 •

edited

Loading

xuhuisheng commented Sep 8, 2020 •

edited

Loading

oleid commented Sep 9, 2020 •

edited

Loading

reinka commented Sep 9, 2020 •

edited

Loading

reinka commented Sep 10, 2020 •

edited

Loading

xuhuisheng commented Oct 3, 2020 •

edited

Loading

da-phil commented Nov 18, 2020 •

edited

Loading

peterdfields commented Aug 5, 2021 •

edited

Loading