Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libtensorflow with RX6600 crashes #2343

Open
stone17 opened this issue Dec 30, 2023 · 0 comments
Open

libtensorflow with RX6600 crashes #2343

stone17 opened this issue Dec 30, 2023 · 0 comments

Comments

@stone17
Copy link

stone17 commented Dec 30, 2023

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.16.0

Custom code

No

OS platform and distribution

22.04

Mobile device

N/A

Python version

3.10

Bazel version

6.1.0

GCC/compiler version

clang-16

CUDA/cuDNN version

No response

GPU model and memory

gfx1032

Current behavior?

Not sure the RX6600 can actually run ROCM, but I managed to compiled libtensorflow.so.2.16 and libtensorflow_framework.so.2.16.0 compiled with ROCM-6.0.0 on develop-upstream branch.
I symlinked the libs to the Pixinsight folder and started running BlurrXterminator (which uses libtensorflow).
After adding export HSA_OVERRIDE_GFX_VERSION=10.3.0 it tries to use the GPU, but crahses with following message:

PixInsight Core 1.8.9-2 Ripley (x64)
Copyright (c) 2003-2023 Pleiades Astrophoto

2023-12-30 20:56:21.975542: E external/local_xla/xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
2023-12-30 20:57:19.511979: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-30 20:57:19.650407: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.829698: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.830014: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.830689: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.830986: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.831311: I external/local_xla/xla/stream_executor/rocm/rocm_gpu_executor.cc:812] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-30 20:57:19.831445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7140 MB memory: -> device: 0, name: AMD Radeon RX 6600, pci bus id: 0000:03:00.0
2023-12-30 20:57:20.127638: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1703966242.295913 5391 gpu_launch_config.h:174] Check failed: err == hipSuccess (98 vs. 0)
F0000 00:00:1703966242.295975 5389 gpu_launch_config.h:174] Check failed: err == hipSuccess (98 vs. 0)
F0000 00:00:1703966242.296268 5392 gpu_launch_config.h:174] Check failed: err == hipSuccess (98 vs. 0)
2023-12-30 20:57:22.797753: F tensorflow/core/kernels/cwise_op_gpu_fma.cu.cc:106] Non-OK-status: GpuLaunchKernel(CwiseFusedMulAddKernel<T, N, Type>, config.block_count, config.thread_per_block, 0, device.stream(), config, out, x1, y1, x2) status: INTERNAL: Cuda call failed with 303
/opt/PixInsight/bin/PixInsight.sh: line 45: 4591 Aborted (core dumped) /opt/PixInsight/bin/PixInsight

Standalone code to reproduce the issue

Relevant log output

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  Uuid:                    CPU-XX
  Marketing Name:          Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   4000
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            8
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    16260160(0xf81c40) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16260160(0xf81c40) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16260160(0xf81c40) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1030
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon RX 6600
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      2048(0x800) KB
    L3:                      32768(0x8000) KB
  Chip ID:                 29695(0x73ff)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2750
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            28
  SIMDs per CU:            2
  Shader Engines:          2
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 116
  SDMA engine uCode::      76
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1030
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant