libtensorflow with RX6600 crashes #2343

stone17 · 2023-12-30T20:13:40Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.16.0

Custom code

No

OS platform and distribution

22.04

Mobile device

N/A

Python version

3.10

Bazel version

6.1.0

GCC/compiler version

clang-16

CUDA/cuDNN version

No response

GPU model and memory

gfx1032

Current behavior?

Not sure the RX6600 can actually run ROCM, but I managed to compiled libtensorflow.so.2.16 and libtensorflow_framework.so.2.16.0 compiled with ROCM-6.0.0 on develop-upstream branch.
I symlinked the libs to the Pixinsight folder and started running BlurrXterminator (which uses libtensorflow).
After adding export HSA_OVERRIDE_GFX_VERSION=10.3.0 it tries to use the GPU, but crahses with following message:

2023-12-30 20:56:21.975542: E external/local_xla/xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
2023-12-30 20:57:19.511979: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-30 20:57:19.650407: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.829698: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.830014: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.830689: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.830986: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.831311: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.831445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7140 MB memory: -> device: 0, name: AMD Radeon RX 6600, pci bus id: 0000:03:00.0
2023-12-30 20:57:20.127638: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1703966242.295913 5391 gpu_launch_config.h:174] Check failed: err == hipSuccess (98 vs. 0)
F0000 00:00:1703966242.295975 5389 gpu_launch_config.h:174] Check failed: err == hipSuccess (98 vs. 0)
F0000 00:00:1703966242.296268 5392 gpu_launch_config.h:174] Check failed: err == hipSuccess (98 vs. 0)
2023-12-30 20:57:22.797753: F tensorflow/core/kernels/cwise_op_gpu_fma.cu.cc:106] Non-OK-status: GpuLaunchKernel(CwiseFusedMulAddKernel<T, N, Type>, config.block_count, config.thread_per_block, 0, device.stream(), config, out, x1, y1, x2) status: INTERNAL: Cuda call failed with 303
/opt/PixInsight/bin/PixInsight.sh: line 45: 4591 Aborted (core dumped) /opt/PixInsight/bin/PixInsight

Standalone code to reproduce the issue

Relevant log output

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  Uuid:                    CPU-XX
  Marketing Name:          Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   4000
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            8
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    16260160(0xf81c40) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16260160(0xf81c40) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16260160(0xf81c40) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1030
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon RX 6600
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      2048(0x800) KB
    L3:                      32768(0x8000) KB
  Chip ID:                 29695(0x73ff)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2750
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            28
  SIMDs per CU:            2
  Shader Engines:          2
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 116
  SDMA engine uCode::      76
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1030
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libtensorflow with RX6600 crashes #2343

libtensorflow with RX6600 crashes #2343

stone17 commented Dec 30, 2023 •

edited

Loading

libtensorflow with RX6600 crashes #2343

libtensorflow with RX6600 crashes #2343

Comments

stone17 commented Dec 30, 2023 • edited Loading

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

stone17 commented Dec 30, 2023 •

edited

Loading